Ch 02 — Double-entry and the general ledger as a database
PyPI workbook run (Track D)
From inside your Track D workbook folder (created by pystatsv1 workbook init --track d --dest ...), run:
pystatsv1 workbook run |trackd_run|
Outputs are written under outputs/track_d/ by default.
If you’re unsure what a file is for, start with Track D Outputs Guide.
To see the full chapter-by-chapter run map (D00–D23), see Track D chapter index (PyPI).
Optional: write to a custom output folder:
pystatsv1 workbook run |trackd_run| --outdir outputs/track_d_custom
Interpretation prompts (quick self-check):
What is the accounting or business measurement goal in this chapter?
Which invariant/check would catch a “numbers look fine but are wrong” mistake here?
Why this matters (for accountants)
Your general ledger (GL) is a database of business events. If it’s structured well (and posted consistently), analysis becomes fast, reliable, and repeatable — and forecasting later becomes dramatically easier.
This chapter translates “debits and credits” into a data model: rows, keys, validation rules, and tidy exports.
Learning objectives
Explain debits and credits in plain language (double-entry logic).
Interpret the chart of accounts (COA) as an analytics schema.
Produce an analysis-ready GL export with signed amounts and statement mapping.
Run controls-aware checks that catch common posting errors early.
Concise refresher: double-entry and the GL (RTD-native)
Double-entry accounting means every business event is recorded with equal total debits and total credits. This is a built-in consistency check that helps you catch errors early and supports a clean audit trail.
Debits and credits are not “good vs bad.” They are directional signs that depend on the type of account:
Assets normally carry a debit balance (cash, inventory, equipment).
Liabilities normally carry a credit balance (accounts payable, loans).
Equity normally carries a credit balance (owner’s equity, retained earnings).
Revenue normally carries a credit balance (sales).
Expenses normally carry a debit balance (rent, payroll expense).
A single transaction is typically stored as multiple GL lines that share a common
transaction identifier (txn_id). Example (cash sale):
Debit Cash
Credit Sales Revenue
If you later discover an error, best practice is to correct with a new entry (or a reversing entry + replacement) rather than deleting history.
Chart of Accounts (COA) is your ledger’s schema: it defines accounts, rollups, and stable reporting structure.
Trial Balance (TB) is the monthly “checksum”: it summarizes debits/credits by account and helps you confirm internal consistency before producing financial statements.
Common posting errors and the control that catches them
These are the problems that silently destroy analytics (and later, forecasts) — and the simplest control to prevent them:
Unbalanced transactions (debits ≠ credits) - Control: per-
txn_idbalancing check; block posting or flag for review.Wrong account selection / miscoding - Control: COA governance (clear names, fewer ambiguous buckets) + variance review.
Wrong period (cutoff errors) - Control: month-end cutoff checklist; review large late entries.
Swapped debit/credit direction - Control: “normal balance” logic + automated reasonableness checks.
Duplicate postings - Control: duplicate detection on (date, amount, counterparty, doc_id) patterns.
In this chapter’s lab, we encode several of these as automated checks.
Accounting Connection (PDF refresher)
Refreshes: double-entry, general ledger, bookkeeper role.
Dataset tables used (LedgerLab core)
This chapter continues with the small LedgerLab core ledger dataset.
The larger NSO v1 running case (data/synthetic/nso_v1) is introduced in later chapters.
chart_of_accounts.csv(COA “schema”: account_id, type, normal side)gl_journal.csv(transaction detail lines: debits, credits, doc IDs)trial_balance_monthly.csv(monthly checksum summary)
PyStatsV1 lab (Run it)
If you haven’t generated the LedgerLab core dataset yet:
make business-sim
Run Chapter 2:
make business-ch02
Key artifacts written to outputs/track_d:
business_ch02_summary.json(checks + quick metrics)business_ch02_gl_tidy.csv(analysis-ready export: signed amounts + mapping)business_ch02_trial_balance.csv(recomputed from the GL)business_ch02_account_rollup.csv(rollup by account type)business_ch02_tb_by_account.png(plot)
What the automated checks verify (exactly)
When you run make business-ch02, the script prints these checks:
Chart of Accounts (COA) checks
coa_account_ids_unique: eachaccount_idappears exactly once (a true schema).coa_account_types_valid: account types are restricted to Asset, Liability, Equity, Revenue, Expense. (See also:coa_bad_account_typesif anything is invalid.)coa_normal_sides_valid: normal sides are only Debit or Credit. (See also:coa_bad_normal_sidesif anything is invalid.)
GL integrity checks
gl_account_ids_in_coa: every GLaccount_idexists in the COA. (See also:gl_missing_account_idsto list offenders.)gl_debits_nonnegativeandgl_credits_nonnegative: prevents negative postings that usually indicate sign mistakes or malformed exports.
Double-entry transaction check
transactions_balanced: for everytxn_id, sum(debits) == sum(credits). Companion diagnostics: -n_transactions: number of transactions observed -n_unbalanced: number of transactions failing the rule -max_abs_diff: worst imbalance magnitude (0.0 is ideal)
Trial balance tie-out
trial_balance_matches_source: trial balance recomputed from detail GL lines matches the providedtrial_balance_monthly.csv.Diagnostic:
trial_balance_max_abs_diff: largest absolute difference (floating-point noise close to 0 is normal).
Interpretation & decision memo
What GL structure decisions help forecasting later? (stable account definitions, consistent dimensions, clear statement mapping)
What controls prevent miscoding and reclass churn? (transaction balancing, COA governance, tie-outs, cutoff discipline)
End-of-chapter problems
Post a set of transactions; verify debits=credits by transaction ID.
Redesign a messy COA into an analysis-friendly COA (fewer ambiguous buckets).
Propose a month-end checklist that prevents recurring coding errors.
Define your signed-amount convention and explain how it affects reporting.
Textbook alignment notes
Textbook Part A: Chapters 1–3.