(2) Core Data Structures and Summaries – Presentation

Module 2 Presentation: Core Data Structures and Summaries

This module explains the containers you will use constantly in R and how to summarize them without losing meaning.

Beginner outcome: you will know the difference between vectors, factors, lists, and data frames, and when to use each.

1) Data structures in plain language

  • Vector: one-dimensional values of same type (numbers, text, logical).
  • Factor: categories with fixed levels (e.g., Control/Treatment).
  • List: mixed objects grouped together.
  • Data frame: table with rows and columns.

2) Build each structure yourself

# Vector
heights <- c(12.1, 12.8, 13.4, 11.9)

# Factor
treatment <- factor(c("Control", "Treatment", "Treatment", "Control"))

# Data frame
df <- data.frame(plant_id = 1:4, treatment = treatment, height_cm = heights)

# List
bundle <- list(metadata = "trial-01", data = df, n = nrow(df))

3) Read and summarize correctly

str(df)
summary(df)
mean(df$height_cm)
table(df$treatment)

4) Why structure matters scientifically

  • If categories are text instead of factors, model behavior can differ.
  • If numbers are imported as text, calculations fail or become wrong.
  • If IDs are inconsistent, joins and merges break silently.

5) Beginner safety checks before analysis

  1. Run str(df) and confirm each column type.
  2. Run summary(df) and look for impossible values.
  3. Run table() for categories to spot typos.