The Track D dataset contract (what scripts expect)
Why this exists: Track D works because every chapter agrees on a shared data contract. This chapter explains the contract at a high level.
Learning objectives
Know the minimum tables required for GL-based analysis (
chart_of_accounts+gl_journal).Explain what
normalized/outputs are and why we prefer them for analysis.Understand where synthetic datasets come from (seeded, reproducible).
Outline
Inputs vs normalized outputs
BYOD projects store raw exports under
tables/(source-specific).Normalization produces
normalized/chart_of_accounts.csvandnormalized/gl_journal.csv(canonical).Everything after that is “just analysis.”
Column naming and why it matters
Stable column headers allow scripts to be reused across systems.
If headers drift, you want a failure early (during normalize/validate), not silent bad analysis.
What pystatsv1 trackd validate does conceptually
Uses a profile (for example,
core_gl) to decide what tables/columns are required.Checks basic schema and required columns.
Catches common data issues: missing dates, non-numeric amounts, or malformed account identifiers.
Where this connects in the workbook
Track D Dataset Map (table-by-table map)
Track D Outputs Guide (artifacts and how to use them)
Track D BYOD: Bring Your Own Data (normalization and validation commands)
Note
This page is intentionally an outline right now. Expand it incrementally as we refine Track D narrative.