Appendix 14D: Artifact QA checklist (big picture — what and why before you share results)

Chapter 14 produces an “analysis pack” of outputs: a driver table, a short memo, model summaries, and figures. These artifacts are designed to be reproducible and explainable.

But “explainable” does not automatically mean “safe to share.”

This appendix is a professional QA checklist for accountants and analysts — the “what and why” you should verify before you:

send the memo to leadership,
paste coefficients into a forecast model,
brief operations on “what drives COGS,” or
publish results in a report.

The goal is not perfection. The goal is trustworthy communication: your results should be accurate, traceable, and framed with the right guardrails.

What QA protects you from

In real accounting/FP&A work, the biggest failure modes are rarely “math errors.” They tend to be:

Measurement errors (wrong sign, wrong month, wrong definition)
Broken lineage (numbers cannot be traced back to a credible source)
Over-interpretation (treating a driver lens as causation)
Model fragility (one month or one event dominates the fit)
Reporting drift (figures and memo say different things)
Reproducibility gaps (nobody can recreate what you did next week)

Chapter 14 exists to teach a regression workflow that behaves like accounting work: disciplined, auditable, and fit for planning conversations.

This QA checklist is the “controls layer” on top of the analysis pack.

Inputs and outputs in scope

This checklist assumes the standard Chapter 14 artifacts:

ch14_driver_table.csv
ch14_regression_memo.md
ch14_regression_summary.json
ch14_regression_design.json
ch14_figures_manifest.csv
figures/ (PNG charts referenced by the manifest)

and that the dataset (NSO v1) exists locally under:

data/synthetic/nso_v1/

Quick start: the 5-minute QA pass

If you only have five minutes, do these checks in this order:

Open ch14_driver_table.csv and confirm months are complete and sorted.
Confirm sign conventions: - units_sold should be positive (for the Chapter 14 driver table definition).
Read ch14_regression_memo.md and check the “story” matches your intuition: - baseline vs rate interpretation - driver lens, not causation
Open the figures listed in ch14_figures_manifest.csv: - does the relationship look reasonably linear? - any obvious outlier month dominating?
Confirm all expected artifacts exist (no missing files) and rerun:
```
make business-ch14
```

If anything looks off in the 5-minute pass, stop and do the deeper pass below.

Deeper QA: the 30-minute professional pass

This section is organized as a set of gates. You do not need to “check everything forever.” You are trying to reach a reasonable professional standard before sharing results.

Gate 1 — Provenance and reproducibility

Why this matters: if you can’t recreate the results, you can’t defend them.

Checks:

Record (or confirm) the dataset and run command used. In this project, the most common baseline run is:
```
make business-nso-sim
make business-validate
make business-ch14
```
Confirm that artifacts are generated (not manually edited). The outputs live under:
- outputs/track_d/track_d/
Confirm you can regenerate cleanly and get the same results (same dataset + seed).

Red flags:

You cannot identify which dataset folder was used.
A teammate cannot reproduce your numbers on the same commit.

Gate 2 — Driver table integrity (the foundation)

Why this matters: regression can look “precise” even when the inputs are wrong. The driver table is the “data contract” of Chapter 14.

Open: ch14_driver_table.csv

Checks:

Month coverage - Are months continuous (no unexpected gaps)? - Is the time window what you expect (e.g., last 12/24 months)?
Sorting and duplicates - One row per month. - No duplicated months. - Months are in chronological order.
Sign conventions - units_sold is positive (per Chapter 14 definition). - Revenue and COGS have the expected sign convention for your statements.
Magnitude sanity - Units, revenue, and COGS are in plausible ranges. - No single month is wildly out of scale unless the business story explains it.
Lineage plausibility - Units sold came from inventory movements. - Invoice count came from A/R events. - Revenue and COGS came from monthly income statement lines.

Common errors this catches:

Units sold computed with the wrong sign
Mismatched month keys (e.g., revenue in one month, units in another)
Missing months due to filtering
“Accidental duplicates” caused by grouping logic

Gate 3 — Model sanity (baseline vs rate)

Why this matters: accountants often use regression to separate “fixed vs variable” or to build planning rates. If slopes/intercepts are not plausible, your plan will be wrong.

Open: ch14_regression_summary.json and/or read ch14_regression_memo.md

Checks:

Direction makes sense - Revenue slope vs units should be positive. - COGS slope vs units should be positive. - If a slope is negative, treat it as a stop sign until explained.
Intercept (baseline) plausibility - For revenue: a large positive intercept may indicate timing or non-unit revenue. - For COGS: a positive intercept might reflect baseline costs, but it should be explainable.
Rate plausibility - Revenue-per-unit lens: does it roughly match a blended selling price you’d believe? - COGS-per-unit lens: does it roughly match a blended unit cost you’d believe?
R² interpretation discipline - R² is not “truth”; it’s “how much of the variance the model explains in-sample.” - High R² may also reflect seasonality/timing; it does not prove causation.

Common errors this catches:

A driver computed incorrectly yields implausible slopes
A single outlier month inflates or distorts fit
A mismatch between “what the memo claims” and the actual coefficients

Gate 4 — Outliers and dominance (is one month driving everything?)

Why this matters: a fit can look good but be driven by one extreme point. That’s dangerous for planning.

Checks:

Open the figures referenced by ch14_figures_manifest.csv and look for: - a single month far from the rest, - a fitted line that “aims” mainly at one point, - clusters that suggest regime changes (early months behave differently than later months).
Optional: do a “remove-one-month” thought experiment: - If one month is extreme, ask: would the slope change drastically if that month disappeared?

Interpretation outcome:

If one point dominates, you can still share the analysis — but you must say: “Results are sensitive to month X; treat slopes as tentative.”

Gate 5 — Residual pattern check (is the driver lens stable?)

Why this matters: patterned residuals often indicate missing drivers, mix shifts, or nonlinearity.

Checks:

Look for systematic patterns in the “actual vs predicted” style figure(s): - consecutive months consistently above/below prediction, - seasonal shape not captured by the model, - “break point” behavior (model works then stops working).

How to respond professionally:

If residuals look random-ish, your driver lens is likely stable enough for planning.
If residuals show patterns, share results with explicit caveats: “This suggests mix shifts/seasonality; consider segmentation or seasonality indicators.”

Gate 6 — Artifact completeness and internal consistency

Why this matters: leadership will ask “where did that number come from?” You want your artifacts to agree with each other.

Checks:

All expected files exist in the outputs folder.
ch14_figures_manifest.csv lists figure files that actually exist under figures/.
The memo’s described slopes/baseline story matches the JSON summary.
The “design contract” matches the chapter narrative (models m1/m2/m3 as documented).

This is where you catch subtle errors like:

The code changed but the memo language wasn’t updated
Figures were generated from an older run
A refactor changed output paths

Gate 7 — Communication guardrails (what you are allowed to claim)

Why this matters: the most common professional failure is claiming causation or certainty.

Before you share the memo, confirm it clearly communicates:

Regression here is a driver lens, not causation proof.
Slopes are planning rates, not immutable laws.
Structural changes (pricing, mix, capacity, policy) can break the relationship.
Residuals represent “unexplained movement” that may require investigation.

A safe, professional phrasing pattern is:

“Given the recent history in this dataset, the implied rate is…”
“If units move by X, the model implies outcome moves by about Y, subject to residual variation.”
“This month is an outlier relative to the driver story; investigate…”
“This is useful for planning; not evidence of causality.”

Closing note: QA is part of the method, not a separate chore

In Track D, the method is not just “run regression.” The method is:

measurement discipline,
reproducible artifacts,
explainable coefficients,
and professional communication guardrails.

Appendix 14C tells you what each artifact is. Appendix 14D tells you how to trust and communicate those artifacts responsibly.

Appendix 14D: Artifact QA checklist (big picture — what and why before you share results)

What QA protects you from

Inputs and outputs in scope

Quick start: the 5-minute QA pass

Deeper QA: the 30-minute professional pass

Gate 1 — Provenance and reproducibility

Gate 2 — Driver table integrity (the foundation)

Gate 3 — Model sanity (baseline vs rate)

Gate 4 — Outliers and dominance (is one month driving everything?)

Gate 5 — Residual pattern check (is the driver lens stable?)

Gate 6 — Artifact completeness and internal consistency

Gate 7 — Communication guardrails (what you are allowed to claim)

Sharing checklist (go / no-go)

Closing note: QA is part of the method, not a separate chore