(3) Experimental Data Quality and Error Checks – Presentation

Module 3 Presentation: Experimental Data Quality and Error Checks

This module teaches how to detect errors before they damage conclusions. Quality checks are part of science, not optional cleanup.

Beginner outcome: you will run a repeatable quality checklist and document what you changed.

1) Typical quality problems

Missing values (empty cells or NA).
Wrong units (cm vs mm mixed in one column).
Typos in categories (“Treatmnt” vs “Treatment”).
Out-of-range values (negative counts where impossible).
Duplicate IDs.

2) Fast quality checklist

# Missing values per column
colSums(is.na(df))

# Duplicate IDs
sum(duplicated(df$plant_id))

# Category spelling check
unique(df$treatment)

# Numeric range check
summary(df$height_cm)

3) Document every correction

Create a simple log in your script as comments:

# QC log
# 2026-07-02: corrected treatment typo 'Treatmnt' -> 'Treatment'
# 2026-07-02: removed duplicated row for plant_id 103
# 2026-07-02: converted height from mm to cm for batch B

4) Why this protects your results

Prevents garbage-in-garbage-out analysis.
Makes your process auditable by collaborators.
Improves reproducibility for publication and peer review.

Next step: run the checks yourself with guided exercises:
Go to Practical Module 3