(3) Experimental Data Quality and Error Checks – Presentation

Module 3 Presentation: Experimental Data Quality and Error Checks

This module teaches how to detect errors before they damage conclusions. Quality checks are part of science, not optional cleanup.

Beginner outcome: you will run a repeatable quality checklist and document what you changed.

1) Typical quality problems

  • Missing values (empty cells or NA).
  • Wrong units (cm vs mm mixed in one column).
  • Typos in categories (“Treatmnt” vs “Treatment”).
  • Out-of-range values (negative counts where impossible).
  • Duplicate IDs.

2) Fast quality checklist

# Missing values per column
colSums(is.na(df))

# Duplicate IDs
sum(duplicated(df$plant_id))

# Category spelling check
unique(df$treatment)

# Numeric range check
summary(df$height_cm)

3) Document every correction

Create a simple log in your script as comments:

# QC log
# 2026-07-02: corrected treatment typo 'Treatmnt' -> 'Treatment'
# 2026-07-02: removed duplicated row for plant_id 103
# 2026-07-02: converted height from mm to cm for batch B

4) Why this protects your results

  • Prevents garbage-in-garbage-out analysis.
  • Makes your process auditable by collaborators.
  • Improves reproducibility for publication and peer review.