Track D: Apply what you learned to your data
This page is the bridge between the Track D running case (LedgerLab + NSO v1) and your own accounting / bookkeeping / finance data.
The promise of Track D: you leave with analyst habits that transfer:
think in tables (grain + keys + joins)
write contracts before analysis
run checks (duplicates, missingness, reconciliations)
produce reproducible outputs (tables + charts + short memo)
Tip
If your starting point is a real export (GnuCash / QuickBooks / bank / invoices), begin with Track D BYOD: Bring Your Own Data to normalize it into Track D’s canonical tables. Then come back to this page for the analysis checklist.
If you haven’t yet, skim these pages first:
Track D Student Edition (Workbook Landing) (entry point)
Track D Dataset Map (what the Track D tables are)
Track D Outputs Guide (what the Track D scripts write)
Two paths
Most student projects fall into one of these paths. Pick the one that matches what you actually have today.
Path |
When it’s a good fit |
|---|---|
Path A: exports only |
You have bank transactions, invoices, bills, payroll exports, but no clean GL detail export. You can still build a monthly “TB-style” table (a consistent monthly rollup) and do real analysis. |
Path B: GL export |
You have a General Ledger detail (or journal export) with debits/credits by account. You can validate fast and get to trial-balance and statement style outputs quickly. |
The “first 30 minutes” recipe below is intentionally minimal. You can make it fancier later.
Setup once (recommended)
Create a small project folder structure and keep raw exports untouched.
my_trackd_project/
raw/ # untouched exports
working/ # renamed columns, cleaned values
outputs/ # your tables + charts + short memo
notes/ # assumptions, mapping notes, QA notes
Privacy note: remove names / emails / account numbers before sharing.
Path A: exports only (bank + invoices/bills)
You can do meaningful analysis without a full GL. Your first goal is a monthly rollup table you can trust. Think of it as “trial-balance style”: consistent columns, consistent sign conventions, reproducible from your exports.
Minimum required columns
Bank transactions (CSV)
posted_date(ordate): transaction date (parseable)description: text descriptionamount: numeric amount (see sign conventions below)(strongly recommended)
bank_txn_id: unique id from the bank export
Invoices / sales (CSV) (if you have them)
invoice_id: unique idinvoice_date(ordate): invoice dateamount: invoice total (or subtotal + tax)(optional but very useful)
customer_idorcustomer(optional)
paid_dateorstatus
Sign conventions (decide early)
Pick one convention and stick to it:
cash inflows are positive
cash outflows are negative
If your bank export uses the opposite (or splits debit/credit into separate columns), normalize it during the “working” step.
First 30 minutes recipe
Copy exports into
raw/.Create cleaned versions in
working/with consistent headers.Produce one table + one chart + one sanity check.
Example script (save as working/path_a_30min.py and run with python working/path_a_30min.py):
from pathlib import Path
import pandas as pd
import matplotlib.pyplot as plt
ROOT = Path(__file__).resolve().parents[1]
raw_bank = ROOT / "raw" / "bank.csv"
out_dir = ROOT / "outputs"
out_dir.mkdir(exist_ok=True)
bank = pd.read_csv(raw_bank)
# --- Required columns (rename if needed)
required = {"posted_date", "description", "amount"}
missing = required - set(bank.columns)
if missing:
raise ValueError(f"Bank export missing columns: {sorted(missing)}")
# --- Dates
bank["posted_date"] = pd.to_datetime(bank["posted_date"], errors="coerce")
if bank["posted_date"].isna().any():
bad = bank.loc[bank["posted_date"].isna(), :].head(5)
raise ValueError(f"Some bank dates failed to parse. Examples:\n{bad}")
# --- Amounts
bank["amount"] = pd.to_numeric(bank["amount"], errors="coerce")
if bank["amount"].isna().any():
bad = bank.loc[bank["amount"].isna(), :].head(5)
raise ValueError(f"Some bank amounts failed to parse. Examples:\n{bad}")
# --- Optional QA: duplicate IDs
if "bank_txn_id" in bank.columns:
dup_ids = bank.loc[bank["bank_txn_id"].duplicated(), "bank_txn_id"].unique()
if len(dup_ids) > 0:
print(f"WARNING: duplicate bank_txn_id (showing up to 10): {dup_ids[:10]}")
# --- Month rollup
bank["month"] = bank["posted_date"].dt.to_period("M").astype(str)
g = bank.groupby("month")["amount"]
monthly = pd.DataFrame(
{
"cash_in": g.apply(lambda s: s[s > 0].sum()),
"cash_out": g.apply(lambda s: -s[s < 0].sum()),
"net_cash": g.sum(),
"n_txns": g.size(),
}
).reset_index()
# Sanity check: net_cash = cash_in - cash_out
max_err = (monthly["cash_in"] - monthly["cash_out"] - monthly["net_cash"]).abs().max()
if max_err > 1e-6:
raise ValueError(f"Sanity check failed (cash_in - cash_out != net_cash). max_err={max_err}")
# Save the table
monthly.to_csv(out_dir / "monthly_cash_summary.csv", index=False)
# One chart: net cash by month
plt.figure()
plt.plot(monthly["month"], monthly["net_cash"], marker="o")
plt.xticks(rotation=45, ha="right")
plt.title("Net cash flow by month")
plt.tight_layout()
plt.savefig(out_dir / "net_cash_by_month.png", dpi=150)
print("Wrote:")
print(" -", out_dir / "monthly_cash_summary.csv")
print(" -", out_dir / "net_cash_by_month.png")
What you just did is the Track D pattern:
define a contract (required columns)
normalize types (dates, numeric)
run one QA check (duplicates)
produce a reproducible table + chart
From there, you can grow into richer questions (cash forecasting, seasonality, outliers, customer payment behavior if you have invoices).
Path B: GL export (journal / general ledger detail)
If you have GL detail, you can validate and build trial-balance style tables fast.
Minimum required columns
At minimum:
date: transaction dateaccount_id(oraccount): account identifierdebit: numeric debit amount (0 if none)credit: numeric credit amount (0 if none)
Strongly recommended:
txn_id(orjournal_entry_id): lets you check balance per transactiondescription
Even better (optional):
a Chart of Accounts export you can join on
account_idwithaccount_nameandaccount_type
First 30 minutes recipe
Example script (save as working/path_b_30min.py and run with python working/path_b_30min.py):
from pathlib import Path
import pandas as pd
import matplotlib.pyplot as plt
ROOT = Path(__file__).resolve().parents[1]
raw_gl = ROOT / "raw" / "gl_detail.csv"
out_dir = ROOT / "outputs"
out_dir.mkdir(exist_ok=True)
gl = pd.read_csv(raw_gl)
required = {"date", "account_id", "debit", "credit"}
missing = required - set(gl.columns)
if missing:
raise ValueError(f"GL export missing columns: {sorted(missing)}")
gl["date"] = pd.to_datetime(gl["date"], errors="coerce")
if gl["date"].isna().any():
bad = gl.loc[gl["date"].isna(), :].head(5)
raise ValueError(f"Some GL dates failed to parse. Examples:\n{bad}")
for col in ["debit", "credit"]:
gl[col] = pd.to_numeric(gl[col], errors="coerce").fillna(0.0)
gl["month"] = gl["date"].dt.to_period("M").astype(str)
# Check 1: debits = credits per month (basic reconciliation)
monthly_dc = gl.groupby("month")[["debit", "credit"]].sum().reset_index()
monthly_dc["diff"] = monthly_dc["debit"] - monthly_dc["credit"]
max_diff = monthly_dc["diff"].abs().max()
if max_diff > 1e-6:
print(monthly_dc)
raise ValueError(
"Debits != credits by month. This often means the export was filtered, "
"or some lines are missing. Fix this before you trust any analysis."
)
# Optional Check 2: debits = credits per txn_id (stronger)
if "txn_id" in gl.columns:
per_txn = gl.groupby("txn_id")[["debit", "credit"]].sum()
per_txn["diff"] = per_txn["debit"] - per_txn["credit"]
bad = per_txn[per_txn["diff"].abs() > 1e-6]
if len(bad) > 0:
raise ValueError(
f"Found {len(bad)} unbalanced txn_id rows. "
"Your export may be incomplete or mis-keyed."
)
# Build a monthly TB-style table: debit, credit, net (debit-positive convention)
tb = gl.groupby(["month", "account_id"], as_index=False)[["debit", "credit"]].sum()
tb["net"] = tb["debit"] - tb["credit"]
tb.to_csv(out_dir / "tb_monthly.csv", index=False)
# One chart: total debits and credits by month (a quick integrity picture)
plt.figure()
plt.plot(monthly_dc["month"], monthly_dc["debit"], marker="o", label="debit")
plt.plot(monthly_dc["month"], monthly_dc["credit"], marker="o", label="credit")
plt.xticks(rotation=45, ha="right")
plt.title("GL totals by month (should match)")
plt.legend()
plt.tight_layout()
plt.savefig(out_dir / "gl_debits_credits_by_month.png", dpi=150)
print("Wrote:")
print(" -", out_dir / "tb_monthly.csv")
print(" -", out_dir / "gl_debits_credits_by_month.png")
You now have:
a reconciliation check (debits == credits)
a monthly TB-style table you can analyze
a chart you can put in a memo
From here, you can add one join (Chart of Accounts) and start doing Track D-style statement rollups or account-level diagnostics.
Common pitfalls (and what to do)
Date parsing
Mixed formats (
MM/DD/YYYYvsYYYY-MM-DD) cause silent errors.Always use
pd.to_datetime(..., errors="coerce")and fail fast if you get nulls.
Signs and “negative” exports
Some systems export credits as negative numbers instead of a separate credit column.
Decide your convention, normalize once, and document it in
notes/assumptions.txt.
Duplicate IDs
Bank feeds can duplicate rows (Track D includes this on purpose).
If you have IDs, check duplicates and decide: remove exact duplicates, or keep and flag.
Missing months / incomplete exports
If debits != credits by month, the export is likely filtered or incomplete.
Fix the export first. Don’t “patch” it with analysis code.
Month boundaries
Accruals, backdated entries, and late postings create confusing month swings.
Track D teaches the right response: describe it, measure it, and document the limitation.
Template header pack (copy/paste)
If you want a starting point without hunting for column names, these tiny CSV templates are safe to download and copy into your own project.
Next steps
Once you can produce one table + one chart + one check, you are ready for:
a short “executive summary” memo (what you found + what limits confidence)
deeper Track D-style joins (drivers, segmentation)
classroom labs and rubrics (the next Track D docs pages)