Module 3 Presentation: Experimental Data Quality and Error Checks
This module teaches how to detect errors before they damage conclusions. Quality checks are part of science, not optional cleanup.
Beginner outcome: you will run a repeatable quality checklist and document what you changed.
1) Typical quality problems
- Missing values (empty cells or NA).
- Wrong units (cm vs mm mixed in one column).
- Typos in categories (“Treatmnt” vs “Treatment”).
- Out-of-range values (negative counts where impossible).
- Duplicate IDs.
2) Fast quality checklist
# Missing values per column colSums(is.na(df)) # Duplicate IDs sum(duplicated(df$plant_id)) # Category spelling check unique(df$treatment) # Numeric range check summary(df$height_cm)
3) Document every correction
Create a simple log in your script as comments:
# QC log # 2026-07-02: corrected treatment typo 'Treatmnt' -> 'Treatment' # 2026-07-02: removed duplicated row for plant_id 103 # 2026-07-02: converted height from mm to cm for batch B
4) Why this protects your results
- Prevents garbage-in-garbage-out analysis.
- Makes your process auditable by collaborators.
- Improves reproducibility for publication and peer review.
Next step: run the checks yourself with guided exercises:
Go to Practical Module 3
Go to Practical Module 3
