Module 2 Presentation: Core Data Structures and Summaries
This module explains the containers you will use constantly in R and how to summarize them without losing meaning.
Beginner outcome: you will know the difference between vectors, factors, lists, and data frames, and when to use each.
1) Data structures in plain language
- Vector: one-dimensional values of same type (numbers, text, logical).
- Factor: categories with fixed levels (e.g., Control/Treatment).
- List: mixed objects grouped together.
- Data frame: table with rows and columns.
2) Build each structure yourself
# Vector
heights <- c(12.1, 12.8, 13.4, 11.9)
# Factor
treatment <- factor(c("Control", "Treatment", "Treatment", "Control"))
# Data frame
df <- data.frame(plant_id = 1:4, treatment = treatment, height_cm = heights)
# List
bundle <- list(metadata = "trial-01", data = df, n = nrow(df))
3) Read and summarize correctly
str(df) summary(df) mean(df$height_cm) table(df$treatment)
4) Why structure matters scientifically
- If categories are text instead of factors, model behavior can differ.
- If numbers are imported as text, calculations fail or become wrong.
- If IDs are inconsistent, joins and merges break silently.
5) Beginner safety checks before analysis
- Run
str(df)and confirm each column type. - Run
summary(df)and look for impossible values. - Run
table()for categories to spot typos.
Next step: practice these checks immediately:
Go to Practical Module 2
Go to Practical Module 2
