The Track D dataset contract (what scripts expect)
==================================================

**Why this exists:** Track D works because every chapter agrees on a shared data contract. This chapter explains the contract at a high level.

Learning objectives
-------------------

- Know the minimum tables required for GL-based analysis (``chart_of_accounts`` + ``gl_journal``).
- Explain what ``normalized/`` outputs are and why we prefer them for analysis.
- Understand where synthetic datasets come from (seeded, reproducible).

Outline
-------

Inputs vs normalized outputs
----------------------------

- BYOD projects store raw exports under ``tables/`` (source-specific).
- Normalization produces ``normalized/chart_of_accounts.csv`` and ``normalized/gl_journal.csv`` (canonical).
- Everything after that is “just analysis.”

Column naming and why it matters
--------------------------------

- Stable column headers allow scripts to be reused across systems.
- If headers drift, you want a failure early (during normalize/validate), not silent bad analysis.

What ``pystatsv1 trackd validate`` does conceptually
----------------------------------------------------

- Uses a profile (for example, ``core_gl``) to decide what tables/columns are required.
- Checks basic schema and required columns.
- Catches common data issues: missing dates, non-numeric amounts, or malformed account identifiers.

Where this connects in the workbook
-----------------------------------

- :doc:`../track_d_dataset_map` (table-by-table map)
- :doc:`../track_d_outputs_guide` (artifacts and how to use them)
- :doc:`../track_d_byod` (normalization and validation commands)

.. note::

   This page is intentionally an outline right now. Expand it incrementally as we refine Track D narrative.