Statistics is the branch of mathematics concerned with the collection, organisation, analysis, and
interpretation of data. In the DSE compulsory syllabus, we focus on descriptive statistics --
summarising a dataset through measures of central tendency and measures of dispersion. This page
also covers grouped data techniques and graphical representations such as box-and-whisker plots.
These tools are frequently combined with probability) concepts in exam questions.
The mean (arithmetic average) of a dataset {x1,x2,…,xn} is defined as:
xˉ=n1i=1∑nxi
The mean uses every data value, making it sensitive to outliers. It is the only measure of central
tendency that lends itself to algebraic manipulation (e.g., combining datasets).
Examples
The scores of 5 students are 72,85,90,68,80. The mean is xˉ=572+85+90+68+80=5395=79.
If every score is increased by 5 bonus marks, the new mean is 79+5=84.
The mode is the value that occurs most frequently in a dataset. A dataset may be unimodal (one
mode), bimodal (two modes), multimodal, or have no mode at all.
The mode is the only measure of central tendency applicable to nominal (categorical) data.
Measures of dispersion (spread) quantify how far individual data values deviate from the centre. Two
datasets can share the same mean yet have very different spreads.
Variance measures the average squared deviation from the mean. There are two versions depending on
whether the data represents the entire population or a sample drawn from a larger
population.
Population variance (divides by n):
σ2=n1i=1∑n(xi−xˉ)2
Sample variance (divides by n−1):
s2=n−11i=1∑n(xi−xˉ)2
An equivalent computational formula is:
σ2=n1i=1∑nxi2−n1(i=1∑nxi)2
Why n vs n−1? Dividing by n−1 (Bessel's correction) provides an unbiased estimator of
the population variance when working with a sample. Using only n data points, the sample mean
xˉ is closer to the data points than the true population mean μ, so the squared
deviations tend to underestimate the true spread. Dividing by n−1 compensates for this. In the DSE
syllabus, unless the problem explicitly identifies the data as a sample, the population formula
(dividing by n) is typically expected.
Class boundaries: The endpoints of each class interval, with no gaps between consecutive
classes. For example, if raw intervals are 10--19 and 20--29, the class boundaries are
9.5--19.5 and 19.5--29.5.
Class width: The difference between the upper and lower class boundaries.
Class mark (midpoint): xi=2lowerboundary+upperboundary,
used as the representative value for all data in the class.
When class marks are equally spaced, let h be the common class width and A be the class mark of
a convenient class (the assumed mean). Define di=hxi−A. Then:
xˉ=A+∑i=1kfi∑i=1kfidi×h
This method simplifies calculation by working with small integer values of di.
Examples
The following frequency distribution records the marks of 40 students:
In a histogram, the area of each bar represents the frequency of the corresponding class. If class
widths are unequal, the height of each bar is the frequency density:
Frequencydensity=ClasswidthFrequency
The median, quartiles, and other percentiles can be estimated from a cumulative frequency curve
(ogive) by linear interpolation within the relevant class.
σ2=1.2384imes102=123.84, so \sigma = \sqrt{123.84} pprox 11.13.
Question: Two classes sat the same test. Class A (n1=30, xˉ1=72,
σ1=8). Class B (n2=20, xˉ2=80, σ2=6). Find the combined mean and
combined standard deviation.
Question: The following are the lifetimes (in hours) of 10 light bulbs:
820,790,810,780,830,800,795,815,805,855. Determine the range, IQR, and identify any
outliers.
No outliers (all values lie within [757.5,857.5]).
Question: A farmer records the yields (in kg) of two varieties of wheat over several seasons.
Variety A: mean =45, standard deviation =5. Variety B: mean =60, standard deviation =9.
Which variety has more consistent yield?
Answer
CVA=455×100%≈11.1%.
CVB=609×100%=15.0%.
Since CVA< CVB, Variety A has more consistent (less variable) yield relative to its mean.
Question: Given the dataset {a,b,c} with mean 10 and variance 8, find the value of
a2+b2+c2.
Answer
xˉ=3a+b+c=10⟹a+b+c=30.
σ2=3a2+b2+c2−xˉ2=8.
3a2+b2+c2−100=8⟹a2+b2+c2=324.
Question: A set of 20 numbers has mean 15 and standard deviation 3. If each number is
multiplied by 2 and then 5 is added, find the new mean and new standard deviation.
Answer
New mean: 2(15)+5=35.
New variance: 22×32=36.
New standard deviation: 36=6.
Question: The histogram below (described verbally) shows the distribution of weights of 50
apples. The class intervals and frequencies are:
Weight (g)
Frequency
100 -- 119
6
120 -- 139
14
140 -- 159
20
160 -- 179
8
180 -- 199
2
Estimate the median weight from the cumulative frequency distribution.
Answer
Cumulative frequencies: 6,20,40,48,50.
The median is the 250=25th value, which lies in the class 140--159 (cumulative
20 to 40).
Using linear interpolation within the class:
Median=139.5+40−2025−20×(159.5−139.5)=139.5+205×20=139.5+5=144.5g
Question: For the dataset {3,7,7,2,9,5,1,8,6,4}, find ∑xi, ∑xi2,
the mean, and the population variance. Verify your variance using both the definition formula and
the computational formula.
Question: The weekly wages (in dollars) of 8 workers in a small factory are
3200,3500,3800,4200,4500,4800,5200,12000. The factory owner claims the average wage is
USD 5150. Is this claim misleading? Explain using an appropriate measure of central tendency and
dispersion.
Answer
Mean: xˉ=841200=5150. The owner's figure is arithmetically correct.
The median (4350) is a far more representative measure here. The single extreme value of
USD 12000 (likely the owner's own salary or a manager's) inflates the mean by USD 800. The median
is resistant to outliers and better reflects what a typical worker earns.
The range (12000−3200=8800) and the large gap between the mean and median both indicate
significant skewness, confirming the mean is a poor choice of summary statistic.
Question: A set of data has variance 25 and mean 0. A new set is formed by removing the
value 10 from the original set. If the original set had n=6 values, find the new mean and new
variance.
Diagnostic Test
Ready to test your understanding of Dispersion? The diagnostic test contains the hardest questions within the DSE specification for this topic, each with a full worked solution.
Unit tests probe edge cases and common misconceptions. Integration tests combine Dispersion with other DSE mathematics topics to test synthesis under exam conditions.
See Diagnostic Guide for instructions on self-marking and building a personal test matrix.
The DSE typically requires answers to be given to 3 significant figures unless the question specifies otherwise. Exact fractions are preferred when they arise naturally.
A dataset {2,5,8,11,14} has mean xˉ=8 and variance σ2=20. Find the new mean and variance if the value 20 is added.
Solution
New n′=6. New sum =40+20=60. New mean xˉ′=660=10.
∑xi2=4+25+64+121+196=410. New ∑xi2=410+400=810.
New variance: σ′2=6810−102=135−100=35.
Worked Example 14: Standardised scores
In an exam, the mean is 60 and the standard deviation is 10. Student A scores 75 and Student B scores 55. Express each score as a standardised score (z-score).
Solution
zA=1075−60=1.5
zB=1055−60=−0.5
Student A scored 1.5 standard deviations above the mean; Student B scored 0.5 standard deviations below.
Worked Example 15: Finding data from summary statistics
A dataset of 5 positive integers has mean 6 and variance 4. Find all possible datasets.
Solution
∑xi=30 and 5∑xi2−36=4⟹∑xi2=200.
We need five positive integers summing to 30 with squares summing to 200.
If the data is symmetric around 6: try {4,5,6,7,8}.
Sum =30. ∑xi2=16+25+36+49+64=190=200.
Try {2,6,6,6,10}: sum =30, ∑xi2=4+36+36+36+100=212=200.
Try {4,4,6,8,8}: sum =30, ∑xi2=16+16+36+64+64=196=200.
Try {3,5,7,7,8}: sum =30, ∑xi2=9+25+49+49+64=196=200.
Try {4,4,8,6,8}: sum =30, ∑xi2=16+16+64+36+64=196.
Try {2,6,6,8,8}: sum =30, ∑xi2=4+36+36+64+64=204.
Try {4,6,6,6,8}: sum =30, ∑xi2=16+36+36+36+64=188.
There may be no solution with 5 positive integers. Let me try {2,5,7,7,9}: sum =30, ∑xi2=4+25+49+49+81=208.
{3,5,6,8,8}: sum =30, ∑xi2=9+25+36+64+64=198.
{4,5,6,7,8}: ∑xi2=190. Need 200. The deficit is 10. If we change 5 to 6 and 6 to 5: {4,6,5,7,8}: same sum of squares.
If we change 4 to 5 and 8 to 7: {5,5,6,7,7}: ∑xi2=25+25+36+49+49=184.
The minimum ∑xi2 for sum 30 with 5 positive integers is achieved by values closest to 6.
The constraints may not be satisfiable with integers. In an exam, this would typically be solved numerically.
Worked Example 16: Grouped data variance with coding
For the frequency distribution below, find the standard deviation using the assumed mean method.
DSE Practice 1. Two groups of students took the same test. Group A: n1=40, xˉ1=65, σ1=8. Group B: n2=60, xˉ2=72, σ2=10. Find the overall mean and standard deviation.
DSE Practice 2. The heights (in cm) of 8 students are: 158, 162, 165, 168, 170, 172, 175, 180. After converting to feet (1 cm = 0.03281 ft), find the mean and standard deviation in feet.
Solution
Let Y=0.03281X. Then yˉ=0.03281xˉ and σY=0.03281σX.
xˉ=8158+162+165+168+170+172+175+180=81350=168.75 cm.
DSE Practice 3. For the dataset {1,3,5,7,9,11,13}, find the mean deviation (mean absolute deviation) and compare it with the standard deviation.
Solution
xˉ=749=7.
Mean deviation =7∣1−7∣+∣3−7∣+∣5−7∣+∣7−7∣+∣9−7∣+∣11−7∣+∣13−7∣=76+4+2+0+2+4+6=724≈3.43.
σ2=71+9+25+49+81+121+169−49=7455−49=65−49=16.
σ=4.
The standard deviation (4) is greater than the mean deviation (3.43), which is always the case for datasets that are not constant.
DSE Practice 4. A set of data has xˉ=50 and σ=4. If every value is increased by k, the new standard deviation becomes 10. Find k and explain your answer.
Solution
Adding a constant k does not change the standard deviation. Therefore, the new standard deviation should still be σ=4, not 10.
There is no value of k that changes the standard deviation from 4 to 10 by addition alone. To change the standard deviation, we would need to multiply by a constant. If Y=aX+b, then σY=∣a∣σX. For σY=10: ∣a∣=410=2.5.
The question likely intends a multiplication, not just addition. If Y=2.5X+k, then σY=10 for any k.
DSE Practice 5. The table shows the distribution of marks in a test.
Marks
Frequency
0 -- 19
4
20 -- 39
10
40 -- 59
22
60 -- 79
14
80 -- 100
5
Estimate the mean and standard deviation.
Solution
Class marks: 9.5, 29.5, 49.5, 69.5, 89.5. Class widths: 20, 20, 20, 20, 21.
For the coding method with equal class widths (using width 20): A=49.5, h=20.
di: −2, −1, 0, 1, 2 (approximately; the last class has width 21).