Data Handling
We collect, organise, and represent data using various graphs. We calculate and interpret measures of central tendency (mean, median, mode) and spread (range, quartiles), and draw scatter plots to investigate relationships between two variables.
7.1 Collecting and Representing Data
- Identify data sources, distinguish between populations and samples
- Represent data in frequency tables, histograms, bar charts, pie charts, and line graphs
- Identify misuse of statistics in media
Real-World Connection
Data is everywhere — sports statistics, election polls, weather forecasts, and medical research. The way data is represented dramatically affects how it's perceived. A misleading axis scale on a bar chart can make a 2% increase look like a 200% jump. Being able to critically read graphs is one of the most important life skills in the modern information age.
Definition
Population vs Sample
A population is the entire group being studied. A sample is a representative subset. We use samples when studying the whole population is impractical.
Types of Graphs
Graph Type
Best Used For
Bar chart
Comparing discrete categories
Clear visual comparison
Histogram
Continuous data in groups (class intervals)
Shows distribution shape
Pie chart
Showing parts of a whole (percentages)
Easy proportion reading
Line graph
Trends over time
Shows change and direction
Worked Example
Building a frequency table and histogram
Problem
Worked Example
Identifying misleading statistics
Problem
Worked Example
Reading and interpreting a pie chart
Problem
CAPS Cognitive Level Distribution
7.2 Measures of Central Tendency and Spread
- Calculate and interpret the mean, median, and mode
- Calculate the range, inter-quartile range, and identify outliers
- Draw and interpret box-and-whisker plots
Real-World Connection
Quartiles and box plots are used in medicine, business, and sport. A doctor checking whether a child's growth is normal compares the child to a growth curve built from quartiles (25th, 50th, 75th percentile). The IQR tells you where the 'middle 50%' of data lies — a tight IQR means consistent results, a wide IQR means high variability.
Definition
Measures of Central Tendency
Values that represent the 'centre' of a data set.
Definition
Quartiles
Quartiles divide ordered data into four equal groups. Q1 = lower quartile (25th percentile), Q2 = median (50th), Q3 = upper quartile (75th).
Inter-Quartile Range (IQR)
Measures the spread of the middle 50% of data. Less affected by outliers than the range.
Worked Example
Finding quartiles and drawing a box plot
Problem
Worked Example
Mean, median, mode — which to use?
Problem
Worked Example
Effect of changing data on measures
Problem
CAPS Cognitive Level Distribution
7.3 Scatter Plots and Correlation
- Draw scatter plots for bivariate data
- Identify positive, negative, and no correlation from scatter plots
- Draw a line of best fit and use it to make predictions
Real-World Connection
Scatter plots reveal relationships between two variables. Does more study time lead to better marks? A scatter plot of 'hours studied' vs 'test score' would show a positive correlation. Do taller people have larger shoe sizes? Again, positive correlation. Climate scientists use scatter plots to show the correlation between CO₂ levels and global temperature. Correlation is the foundation of predictive analytics in business and medicine.
Types of Correlation
Type
Description
Strong positive
Points close to a line rising left to right
As x increases, y increases
Weak positive
Points loosely scattered, general upward trend
Mild positive relationship
Negative
Points rising right to left
As x increases, y decreases
No correlation
No discernible pattern
x and y are unrelated
⚠️ Warning
Correlation does NOT imply causation. Just because two variables are correlated doesn't mean one causes the other. Example: ice cream sales and drowning rates both rise in summer — but ice cream doesn't cause drowning. Both are caused by a third factor: hot weather.
Worked Example
Drawing and interpreting a scatter plot
Problem
Worked Example
Using line of best fit to predict
Problem
Worked Example
Identifying correlation type
Problem
CAPS Cognitive Level Distribution