Psychological Science & Statistics – Chapter 3
Defining and Measuring Variables: How Concepts Become Data
Psychology studies things we cannot directly see: attention, emotion, memory, stress, prejudice, personality, motivation. These are concepts, but our statistical tools require numbers. Chapter 3 shows how psychologists translate ideas into measurable variables, and how good measurement determines the quality of scientific conclusions.
This chapter has five goals:
explain conceptual vs. operational definitions,
introduce the four scales of measurement (NOIR),
describe reliability and why consistency matters,
explain validity and how we know a measure works,
connect these ideas to a short PyStatsV1 lab.
Most of the statistical mistakes students struggle with later trace back to measurement problems introduced here.
3.1 Conceptual vs. operational definitions
A conceptual definition is the idea you care about. A operational definition is how you measure that idea.
Examples:
Conceptual: Anxiety Operational: Score on the GAD-7 questionnaire.
Conceptual: Aggression Operational: Number of noise blasts delivered in a competitive reaction-time task.
Conceptual: Working memory Operational: Number of items correctly recalled in a digit-span test.
Why operational definitions matter:
They determine what the data actually represent.
Different operationalizations can lead to different conclusions.
Clarity allows replication — another researcher must be able to reproduce your measure.
A strong research question always pairs both:
Conceptual definition → Operational definition → Data
3.2 Scales of measurement (NOIR)
Not all numbers behave the same way statistically. Psychologists classify variables into four scales of measurement, often remembered as NOIR:
- Nominal (names)
Categories with no numeric meaning. Examples: gender identity, therapy type, favorite color.
- Ordinal (rank order)
Ordered categories, but distances between ranks are unknown. Examples: symptom severity ratings (“mild / moderate / severe”), Likert scales.
- Interval (equal units)
Numeric scales with equal steps, but no true zero. Examples: temperature in °C or °F, many psychological test scores.
- Ratio (meaningful zero)
All properties of interval scales plus a true zero. Examples: reaction time, number of errors, hours slept.
Why NOIR matters:
Some statistical tests require interval or ratio data.
Treating ordinal data as interval is common — but needs justification.
Ratio scales allow multiplicative statements (“twice as fast”).
3.3 Reliability: consistency of measurement
A measure must be reliable to be useful. Reliability asks:
“If we measured the same thing again, would we get a similar result?”
Three major forms:
- Test–retest reliability
Are scores stable across time? Important for traits (e.g., personality).
- Inter-rater reliability
Do two observers agree? Important for coding behavior or scoring essays.
- Internal consistency
Do items on a questionnaire measure the same underlying construct? Measured with statistics like Cronbach’s alpha.
Rules of thumb:
Reliability near 0.70 is acceptable in early research.
Low reliability puts an upper bound on validity.
Reliability is necessary — but not sufficient — for good measurement.
3.4 Validity: accuracy of measurement
If reliability is about consistency, validity is about truth.
A measure is valid if it accurately captures the construct it claims to measure.
Major forms of validity:
- Face validity
Does the measure look like it measures the construct?
- Content validity
Does it cover all relevant aspects of the construct?
- Criterion validity
Do scores predict something they should predict? Example: A depression score predicting therapist diagnoses.
- Convergent validity
Does it correlate with other measures of the same construct?
- Discriminant validity
Does it not correlate with measures of different constructs?
Key idea:
A test can be reliable but not valid.
A test cannot be valid unless it is reliable.
3.5 PyStatsV1 lab: exploring variable types
Let’s look at a tiny example dataset (from a future chapter’s lab) and use PyStatsV1 conventions to understand variable types.
import pandas as pd
# Example dataset (placeholder path)
data = pd.read_csv("data/study1_sleep_anxiety.csv")
print(data.head())
print("\nVariable types:")
print(data.dtypes)
This simple script begins the habit of inspecting variables before analyzing them — a crucial skill for psychological researchers.
In full PyStatsV1 labs, students will:
classify variables using NOIR,
identify operational definitions,
check reliability (e.g., Cronbach’s alpha),
assess validity using correlations and scatterplots.
Measurement is where scientific thinking meets statistical reasoning.
3.6 What you should take away
By the end of this chapter you should be able to:
distinguish conceptual and operational definitions,
classify variables into nominal, ordinal, interval, or ratio,
explain why reliability matters and name its three major forms,
describe common types of validity and why no single type is sufficient,
see how PyStatsV1 helps document and inspect variables before analysis.
In later chapters, we will build statistical tests on top of this foundation. Sound conclusions require sound measurement.