Appendix 14B: NSO v1 data dictionary cheat sheet (table → grain → keys → joins → checks)

This appendix is a compact “reference page” for the NSO v1 synthetic dataset used across Track D and specifically in Chapter 14.

Where this data lives

NSO v1 is generated locally (and is gitignored):

  • Default dataset folder: data/synthetic/nso_v1/

Generate it with:

make business-nso-sim
make business-validate

Tip: The most important idea is lineage.

  • Subledgers and operational tables capture activity (A/R, A/P, inventory, payroll…).

  • The GL journal captures double-entry truth.

  • Monthly statements summarize the GL and are used for trend + driver analysis.

Core accounting tables (GL + statements)

Core accounting tables

Table

Grain (one row per…)

Key fields (high value columns)

Common joins

Common checks

chart_of_accounts.csv

Account (account_id)

account_id, account_name, account_type, normal_side

  • account_idgl_journal.account_id

  • account_idtrial_balance_monthly.account_id

  • Unique account_id

  • normal_side consistent with account type rules

gl_journal.csv

Journal line (txn_id + account_id)

txn_id, date, doc_id, description, account_id, debit, credit

  • account_id → COA

  • txn_id ↔ many event tables (A/R, A/P, inventory, payroll, tax, debt, equity)

  • txn_idbank_statement.gl_txn_id

  • For each txn_id: sum(debit) == sum(credit)

  • debit/credit non-negative; one side usually zero

trial_balance_monthly.csv

Month × account (month + account_id)

month, account_id, debit, credit, ending_side, ending_balance

  • account_id → COA

  • month → monthly statements (by month)

  • For each month: total debits == total credits (sanity)

  • Ending balances stable vs GL-derived rebuild (when applicable)

statements_is_monthly.csv

Month × line (month + line)

month, line, amount (lines include Sales Revenue, COGS, Operating Expenses, Net Income)

  • month → driver tables (e.g., Chapter 14)

  • month → cash flow bridge (by month)

  • Gross Profit ≈ Revenue − COGS

  • Net Income ≈ Gross Profit − OpEx (for this synthetic model)

statements_bs_monthly.csv

Month × line (month + line)

month, line, amount (assets/liabilities/equity totals and key accounts)

  • month → cash flow bridge

  • month → rollforwards (A/R, A/P, inventory, payables)

  • Total Assets == Total Liabilities + Total Equity

  • Ending Cash agrees with cash flow “Ending Cash (from bridge)”

statements_cf_monthly.csv

Month × line (month + line)

month, line, amount (CFO/CFI/CFF bridge components + cash rollforward)

  • month → BS and IS (by month)

  • Ending Cash (from bridge) == Ending Cash (balance sheet)

  • Net Change in Cash == Ending − Beginning

Operational and subledger event tables

Operational / subledger tables

Table

Grain (one row per…)

Key fields

Common joins

Common checks

inventory_movements.csv

Inventory movement event (txn_id)

month, txn_id, date, sku, movement_type, qty, unit_cost, amount

  • txn_idgl_journal.txn_id (inventory/COGS postings)

  • month → IS/BS (inventory + COGS by month)

  • sale_issue qty sign convention consistent (units sold often computed as -sum(qty))

  • amount ≈ qty × unit_cost (within rounding rules)

ar_events.csv

A/R event (invoice or collection) (txn_id)

month, txn_id, date, customer, invoice_id, event_type, amount, ar_delta, cash_received

  • txn_idgl_journal.txn_id

  • invoice_idgl_journal.doc_id (common pattern)

  • month → BS line “Accounts Receivable” (rollforward)

  • Sum of ar_delta over time matches AR balance movement

  • Invoice vs collection behavior consistent (cash_received vs ar_delta)

ap_events.csv

A/P event (invoice or payment) (txn_id)

month, txn_id, date, vendor, invoice_id, event_type, amount, ap_delta, cash_paid

  • txn_idgl_journal.txn_id

  • invoice_idgl_journal.doc_id (common pattern)

  • month → BS line “Accounts Payable” (rollforward)

  • Sum of ap_delta over time matches AP balance movement

  • Payment rows show cash_paid behavior consistent with ap_delta

payroll_events.csv

Payroll event (txn_id)

month, txn_id, date, event_type, gross_wages, employee_withholding, employer_tax, cash_paid

  • txn_idgl_journal.txn_id

  • month → BS lines “Wages Payable” and “Payroll Taxes Payable”

  • Payables deltas reconcile with BS payables lines

  • Cash paid timing consistent with accrual/payment/remittance cycle

sales_tax_events.csv

Sales tax event (collection or remittance) (txn_id)

month, txn_id, date, event_type, taxable_sales, tax_amount, cash_paid

  • txn_idgl_journal.txn_id

  • month → BS line “Sales Tax Payable”

  • Payable deltas reconcile with BS sales tax payable

  • Remittance reduces payable; collection increases payable

bank_statement.csv

Bank transaction (bank_txn_id)

month, bank_txn_id, posted_date, description, amount, gl_txn_id

  • gl_txn_idgl_journal.txn_id (link bank item to GL posting)

  • month → cash rollforward checks

  • Bank items link to GL (gl_txn_id exists)

  • Aggregate cash activity consistent with CF/BS cash movement (in this synthetic model)

Schedules and supporting tables

Schedules / supporting tables

Table

Grain (one row per…)

Key fields

Common joins

Common checks

fixed_assets.csv

Asset (asset_id)

asset_id, asset_name, in_service_month, cost, useful_life_months, salvage_value, method

  • asset_iddepreciation_schedule.asset_id

  • in_service_month → depreciation start month logic

  • Unique asset_id

  • Life and salvage values sane (non-negative)

depreciation_schedule.csv

Asset × month (asset_id + month)

month, asset_id, dep_expense, accum_dep, net_book_value

  • asset_id → fixed_assets

  • month → IS/BS (Depreciation expense / Accumulated depreciation / Net PP&E)

  • net_book_value decreases logically; never below salvage (for model rules)

  • Accumulated depreciation monotone increasing

debt_schedule.csv

Loan × month (loan_id + month)

month, loan_id, txn_id, beginning_balance, payment, interest, principal, ending_balance

  • txn_idgl_journal.txn_id (debt-related postings)

  • month → BS line “Notes Payable” and CF financing section

  • Beginning + principal logic consistent: ending ≈ beginning − principal (within rounding)

  • Payment ≈ interest + principal (within rounding)

equity_events.csv

Equity event (txn_id)

month, txn_id, date, event_type, amount

  • txn_idgl_journal.txn_id

  • month → BS equity lines and CF (owner contribution/draw)

  • Contribution increases equity; draw decreases equity (by sign convention)

nso_v1_meta.json

Dataset metadata (single document)

Seed, date window, scenario notes

  • Not joined; used for reproducibility and audit trail

  • Record seed/month window; keep alongside analyses for “what data was this?” tracking

Quick join patterns (most common in Track D)

These are the joins you will use constantly:

gl_journal.account_id  -> chart_of_accounts.account_id
trial_balance_monthly.account_id -> chart_of_accounts.account_id

ar_events.txn_id       -> gl_journal.txn_id
ap_events.txn_id       -> gl_journal.txn_id
inventory_movements.txn_id -> gl_journal.txn_id
payroll_events.txn_id  -> gl_journal.txn_id
sales_tax_events.txn_id -> gl_journal.txn_id
debt_schedule.txn_id   -> gl_journal.txn_id
equity_events.txn_id   -> gl_journal.txn_id
bank_statement.gl_txn_id -> gl_journal.txn_id

(Monthly alignment)
month -> statements_is_monthly.month / statements_bs_monthly.month / statements_cf_monthly.month

Quick “trust gates” (common checks you should do first)

If you only have time for a few checks, do these:

  1. GL transactions balance - For each txn_id in gl_journal.csv: sum(debit) == sum(credit)

  2. Balance sheet equation holds - In statements_bs_monthly.csv: Total Assets == Total Liabilities + Total Equity

  3. Cash rollforward agrees - In statements_cf_monthly.csv: Ending Cash (from bridge) == Ending Cash (balance sheet)

  4. Subledger rollforward sanity - AR/AP payables deltas move in the same direction as the corresponding BS lines.

These checks are why Track D can safely build “explainable regression” on top of the dataset: we’re not modeling noise from broken measurement.