Ch 02 — Double-entry and the general ledger as a database

PyPI workbook run (Track D)

From inside your Track D workbook folder (created by pystatsv1 workbook init --track d --dest ...), run:

pystatsv1 workbook run |trackd_run|

Outputs are written under outputs/track_d/ by default. If you’re unsure what a file is for, start with Track D Outputs Guide.

To see the full chapter-by-chapter run map (D00–D23), see Track D chapter index (PyPI).

Optional: write to a custom output folder:

pystatsv1 workbook run |trackd_run| --outdir outputs/track_d_custom

Interpretation prompts (quick self-check):

  • What is the accounting or business measurement goal in this chapter?

  • Which invariant/check would catch a “numbers look fine but are wrong” mistake here?

Why this matters (for accountants)

Your general ledger (GL) is a database of business events. If it’s structured well (and posted consistently), analysis becomes fast, reliable, and repeatable — and forecasting later becomes dramatically easier.

This chapter translates “debits and credits” into a data model: rows, keys, validation rules, and tidy exports.

Learning objectives

  • Explain debits and credits in plain language (double-entry logic).

  • Interpret the chart of accounts (COA) as an analytics schema.

  • Produce an analysis-ready GL export with signed amounts and statement mapping.

  • Run controls-aware checks that catch common posting errors early.

Concise refresher: double-entry and the GL (RTD-native)

Double-entry accounting means every business event is recorded with equal total debits and total credits. This is a built-in consistency check that helps you catch errors early and supports a clean audit trail.

Debits and credits are not “good vs bad.” They are directional signs that depend on the type of account:

  • Assets normally carry a debit balance (cash, inventory, equipment).

  • Liabilities normally carry a credit balance (accounts payable, loans).

  • Equity normally carries a credit balance (owner’s equity, retained earnings).

  • Revenue normally carries a credit balance (sales).

  • Expenses normally carry a debit balance (rent, payroll expense).

A single transaction is typically stored as multiple GL lines that share a common transaction identifier (txn_id). Example (cash sale):

  • Debit Cash

  • Credit Sales Revenue

If you later discover an error, best practice is to correct with a new entry (or a reversing entry + replacement) rather than deleting history.

Chart of Accounts (COA) is your ledger’s schema: it defines accounts, rollups, and stable reporting structure.

Trial Balance (TB) is the monthly “checksum”: it summarizes debits/credits by account and helps you confirm internal consistency before producing financial statements.

Common posting errors and the control that catches them

These are the problems that silently destroy analytics (and later, forecasts) — and the simplest control to prevent them:

  1. Unbalanced transactions (debits ≠ credits) - Control: per-txn_id balancing check; block posting or flag for review.

  2. Wrong account selection / miscoding - Control: COA governance (clear names, fewer ambiguous buckets) + variance review.

  3. Wrong period (cutoff errors) - Control: month-end cutoff checklist; review large late entries.

  4. Swapped debit/credit direction - Control: “normal balance” logic + automated reasonableness checks.

  5. Duplicate postings - Control: duplicate detection on (date, amount, counterparty, doc_id) patterns.

In this chapter’s lab, we encode several of these as automated checks.

Accounting Connection (PDF refresher)

Refreshes: double-entry, general ledger, bookkeeper role.

Dataset tables used (LedgerLab core)

This chapter continues with the small LedgerLab core ledger dataset. The larger NSO v1 running case (data/synthetic/nso_v1) is introduced in later chapters.

  • chart_of_accounts.csv (COA “schema”: account_id, type, normal side)

  • gl_journal.csv (transaction detail lines: debits, credits, doc IDs)

  • trial_balance_monthly.csv (monthly checksum summary)

PyStatsV1 lab (Run it)

If you haven’t generated the LedgerLab core dataset yet:

make business-sim

Run Chapter 2:

make business-ch02

Key artifacts written to outputs/track_d:

  • business_ch02_summary.json (checks + quick metrics)

  • business_ch02_gl_tidy.csv (analysis-ready export: signed amounts + mapping)

  • business_ch02_trial_balance.csv (recomputed from the GL)

  • business_ch02_account_rollup.csv (rollup by account type)

  • business_ch02_tb_by_account.png (plot)

What the automated checks verify (exactly)

When you run make business-ch02, the script prints these checks:

Chart of Accounts (COA) checks

  • coa_account_ids_unique: each account_id appears exactly once (a true schema).

  • coa_account_types_valid: account types are restricted to Asset, Liability, Equity, Revenue, Expense. (See also: coa_bad_account_types if anything is invalid.)

  • coa_normal_sides_valid: normal sides are only Debit or Credit. (See also: coa_bad_normal_sides if anything is invalid.)

GL integrity checks

  • gl_account_ids_in_coa: every GL account_id exists in the COA. (See also: gl_missing_account_ids to list offenders.)

  • gl_debits_nonnegative and gl_credits_nonnegative: prevents negative postings that usually indicate sign mistakes or malformed exports.

Double-entry transaction check

  • transactions_balanced: for every txn_id, sum(debits) == sum(credits). Companion diagnostics: - n_transactions: number of transactions observed - n_unbalanced: number of transactions failing the rule - max_abs_diff: worst imbalance magnitude (0.0 is ideal)

Trial balance tie-out

  • trial_balance_matches_source: trial balance recomputed from detail GL lines matches the provided trial_balance_monthly.csv.

    Diagnostic: trial_balance_max_abs_diff: largest absolute difference (floating-point noise close to 0 is normal).

Interpretation & decision memo

  • What GL structure decisions help forecasting later? (stable account definitions, consistent dimensions, clear statement mapping)

  • What controls prevent miscoding and reclass churn? (transaction balancing, COA governance, tie-outs, cutoff discipline)

End-of-chapter problems

  1. Post a set of transactions; verify debits=credits by transaction ID.

  2. Redesign a messy COA into an analysis-friendly COA (fewer ambiguous buckets).

  3. Propose a month-end checklist that prevents recurring coding errors.

  4. Define your signed-amount convention and explain how it affects reporting.

Textbook alignment notes

Textbook Part A: Chapters 1–3.