.. _psych_ch3:

Psychological Science & Statistics – Chapter 3
==============================================

Defining and Measuring Variables: How Concepts Become Data
----------------------------------------------------------

Psychology studies things we cannot directly see: attention, emotion,
memory, stress, prejudice, personality, motivation. These are *concepts*,
but our statistical tools require *numbers*. Chapter 3 shows how
psychologists translate ideas into measurable variables, and how good
measurement determines the quality of scientific conclusions.

This chapter has five goals:

* explain conceptual vs. operational definitions,
* introduce the four scales of measurement (NOIR),
* describe reliability and why consistency matters,
* explain validity and how we know a measure works,
* connect these ideas to a short PyStatsV1 lab.

Most of the statistical mistakes students struggle with later trace back
to measurement problems introduced here.

3.1 Conceptual vs. operational definitions
------------------------------------------

A **conceptual definition** is the idea you care about.  
A **operational definition** is how you *measure* that idea.

Examples:

* *Conceptual*: Anxiety  
  *Operational*: Score on the GAD-7 questionnaire.

* *Conceptual*: Aggression  
  *Operational*: Number of noise blasts delivered in a competitive reaction-time task.

* *Conceptual*: Working memory  
  *Operational*: Number of items correctly recalled in a digit-span test.

Why operational definitions matter:

* They determine what the data actually represent.
* Different operationalizations can lead to different conclusions.
* Clarity allows replication — another researcher must be able to reproduce your measure.

A strong research question always pairs both:

::

    Conceptual definition → Operational definition → Data

3.2 Scales of measurement (NOIR)
--------------------------------

Not all numbers behave the same way statistically. Psychologists classify
variables into four **scales of measurement**, often remembered as **NOIR**:

**Nominal (names)**  
    Categories with no numeric meaning.  
    *Examples:* gender identity, therapy type, favorite color.

**Ordinal (rank order)**  
    Ordered categories, but distances between ranks are unknown.  
    *Examples:* symptom severity ratings (“mild / moderate / severe”), Likert scales.

**Interval (equal units)**  
    Numeric scales with equal steps, but no true zero.  
    *Examples:* temperature in °C or °F, many psychological test scores.

**Ratio (meaningful zero)**  
    All properties of interval scales plus a true zero.  
    *Examples:* reaction time, number of errors, hours slept.

Why NOIR matters:

* Some statistical tests **require** interval or ratio data.
* Treating ordinal data as interval is common — but needs justification.
* Ratio scales allow multiplicative statements (“twice as fast”).

3.3 Reliability: consistency of measurement
-------------------------------------------

A measure must be **reliable** to be useful. Reliability asks:

**“If we measured the same thing again, would we get a similar result?”**

Three major forms:

**Test–retest reliability**  
    Are scores stable across time?  
    Important for traits (e.g., personality).

**Inter-rater reliability**  
    Do two observers agree?  
    Important for coding behavior or scoring essays.

**Internal consistency**  
    Do items on a questionnaire measure the same underlying construct?  
    Measured with statistics like **Cronbach’s alpha**.

Rules of thumb:

* Reliability near **0.70** is acceptable in early research.
* Low reliability puts an upper bound on validity.
* Reliability is necessary — but not sufficient — for good measurement.

3.4 Validity: accuracy of measurement
-------------------------------------

If reliability is about consistency, **validity** is about truth.

A measure is valid if it accurately captures the construct it claims to measure.

Major forms of validity:

**Face validity**  
    Does the measure *look* like it measures the construct?

**Content validity**  
    Does it cover all relevant aspects of the construct?

**Criterion validity**  
    Do scores predict something they should predict?  
    *Example:* A depression score predicting therapist diagnoses.

**Convergent validity**  
    Does it correlate with other measures of the same construct?

**Discriminant validity**  
    Does it *not* correlate with measures of different constructs?

Key idea:

::

    A test can be reliable but not valid.
    A test cannot be valid unless it is reliable.

3.5 PyStatsV1 lab: exploring variable types
-------------------------------------------

Let’s look at a tiny example dataset (from a future chapter’s lab) and
use PyStatsV1 conventions to understand variable types.

.. code-block:: python

    import pandas as pd

    # Example dataset (placeholder path)
    data = pd.read_csv("data/study1_sleep_anxiety.csv")

    print(data.head())
    print("\nVariable types:")
    print(data.dtypes)

This simple script begins the habit of **inspecting** variables before
analyzing them — a crucial skill for psychological researchers.

In full PyStatsV1 labs, students will:

* classify variables using NOIR,
* identify operational definitions,
* check reliability (e.g., Cronbach’s alpha),
* assess validity using correlations and scatterplots.

Measurement is where scientific thinking meets statistical reasoning.

3.6 What you should take away
-----------------------------

By the end of this chapter you should be able to:

* distinguish conceptual and operational definitions,
* classify variables into nominal, ordinal, interval, or ratio,
* explain why reliability matters and name its three major forms,
* describe common types of validity and why no single type is sufficient,
* see how PyStatsV1 helps document and inspect variables before analysis.

In later chapters, we will build statistical tests on top of this
foundation. Sound conclusions require sound measurement.