Appendix 14B: NSO v1 data dictionary cheat sheet (table → grain → keys → joins → checks)
This appendix is a compact “reference page” for the NSO v1 synthetic dataset used across Track D and specifically in Chapter 14.
Where this data lives
NSO v1 is generated locally (and is gitignored):
Default dataset folder:
data/synthetic/nso_v1/
Generate it with:
make business-nso-sim
make business-validate
Tip: The most important idea is lineage.
Subledgers and operational tables capture activity (A/R, A/P, inventory, payroll…).
The GL journal captures double-entry truth.
Monthly statements summarize the GL and are used for trend + driver analysis.
Core accounting tables (GL + statements)
Table |
Grain (one row per…) |
Key fields (high value columns) |
Common joins |
Common checks |
|---|---|---|---|---|
|
Account ( |
|
|
|
|
Journal line ( |
|
|
|
|
Month × account ( |
|
|
|
|
Month × line ( |
|
|
|
|
Month × line ( |
|
|
|
|
Month × line ( |
|
|
|
Operational and subledger event tables
Table |
Grain (one row per…) |
Key fields |
Common joins |
Common checks |
|---|---|---|---|---|
|
Inventory movement event ( |
|
|
|
|
A/R event (invoice or collection) ( |
|
|
|
|
A/P event (invoice or payment) ( |
|
|
|
|
Payroll event ( |
|
|
|
|
Sales tax event (collection or remittance) ( |
|
|
|
|
Bank transaction ( |
|
|
|
Schedules and supporting tables
Table |
Grain (one row per…) |
Key fields |
Common joins |
Common checks |
|---|---|---|---|---|
|
Asset ( |
|
|
|
|
Asset × month ( |
|
|
|
|
Loan × month ( |
|
|
|
|
Equity event ( |
|
|
|
|
Dataset metadata (single document) |
Seed, date window, scenario notes |
|
|
Quick join patterns (most common in Track D)
These are the joins you will use constantly:
gl_journal.account_id -> chart_of_accounts.account_id
trial_balance_monthly.account_id -> chart_of_accounts.account_id
ar_events.txn_id -> gl_journal.txn_id
ap_events.txn_id -> gl_journal.txn_id
inventory_movements.txn_id -> gl_journal.txn_id
payroll_events.txn_id -> gl_journal.txn_id
sales_tax_events.txn_id -> gl_journal.txn_id
debt_schedule.txn_id -> gl_journal.txn_id
equity_events.txn_id -> gl_journal.txn_id
bank_statement.gl_txn_id -> gl_journal.txn_id
(Monthly alignment)
month -> statements_is_monthly.month / statements_bs_monthly.month / statements_cf_monthly.month
Quick “trust gates” (common checks you should do first)
If you only have time for a few checks, do these:
GL transactions balance - For each
txn_idingl_journal.csv: sum(debit) == sum(credit)Balance sheet equation holds - In
statements_bs_monthly.csv: Total Assets == Total Liabilities + Total EquityCash rollforward agrees - In
statements_cf_monthly.csv: Ending Cash (from bridge) == Ending Cash (balance sheet)Subledger rollforward sanity - AR/AP payables deltas move in the same direction as the corresponding BS lines.
These checks are why Track D can safely build “explainable regression” on top of the dataset: we’re not modeling noise from broken measurement.