The heights of 100 students have mean 165 cm and variance 49 cm2. A new student of height 181 cm joins the group. Find the new mean and new variance.
Solution:
Original: n=100, xˉ=165, σ2=49.
Sum of original data: ∑xi=100×165=16500.
Sum of squares of original data: ∑xi2=n(σ2+xˉ2)=100(49+27225)=100×27274=2727400.
After adding 181:
New sum: 16500+181=16681.
New n=101.
New mean: xˉnew=10116681≈165.16.
New sum of squares: 2727400+1812=2727400+32761=2760161.
New variance: σnew2=1012760161−(165.16)2≈27328.33−27277.83≈50.50 cm2.
Note: variance has units cm2, while standard deviation has units cm.
UT-2: Coding Effect on Mean and Standard Deviation
Question:
A set of data has mean m and standard deviation s. If each value is transformed by y=3x−5, find the new mean and new standard deviation in terms of m and s.
Solution:
For the transformation y=ax+b:
New mean =a×old mean+b=3m−5.
New standard deviation =∣a∣×old standard deviation=3s.
New variance =a2×old variance=9s2.
The additive constant −5 affects the mean but NOT the standard deviation.
A common mistake is thinking the −5 affects the standard deviation.
IT-1: Dispersion and Probability (with Probability)
Question:
A random variable X takes values 1,2,3,4,5 with probabilities 151, 152, 153, 154, 155 respectively. Find E(X) and Var(X).
Solution:
E(X)=1×151+2×152+3×153+4×154+5×155
=151+4+9+16+25=1555=311
E(X2)=151+8+27+64+125=15225=15
Var(X)=E(X2)−[E(X)]2=15−9121=9135−121=914
IT-2: Dispersion and Inequalities (with Inequalities)
Question:
A set of 50 numbers has mean 20 and standard deviation 4. Using Chebyshev's inequality (or empirical reasoning), at least what percentage of the data lies within 2 standard deviations of the mean?
Solution:
Within 2 standard deviations: 20±2(4)=[12,28].
By Chebyshev's inequality, at least 1−k21=1−41=43=75% of data lies within k=2 standard deviations.
If the data is approximately normally distributed, the empirical rule gives approximately 95%, but Chebyshev gives the guaranteed minimum of 75%.
IT-3: Dispersion and Combinatorics (with Combinatorics)
Question:
All possible samples of size 2 are drawn with replacement from the population 2,4,6. Find the mean and variance of the sampling distribution of the sample mean.
Solution:
Population: 2,4,6, μ=4, σ2=3(4+0+4)=38.
All samples of size 2 with replacement (9 samples):
DSE Exam Technique: Always set up the table with columns for fx and fx2. This earns method marks even if there is a minor calculation error. Leave standard deviation in exact form unless asked for a decimal.
Find the median, quartiles, and interquartile range of the data set:
12,15,18,20,22,25,28,30,35,40,45
Solution:
There are n=11 data values (odd).
Median position: 211+1=6. Median =25.
Lower half: 12,15,18,20,22 (5 values).
Q1 position: 25+1=3. Q1=18.
Upper half: 28,30,35,40,45 (5 values).
Q3 position: 25+1=3. Q3=35.
IQR=Q3−Q1=35−18=17
WE-5: Comparing Distributions Using Standard Deviation
Question:
Class A has test scores with mean 65 and standard deviation 8. Class B has test scores with mean 68 and standard deviation 15. Which class has more consistent performance? Justify your answer.
Solution:
The coefficient of variation (CV) measures relative dispersion:
CVA=658×100%≈12.3%
CVB=6815×100%≈22.1%
Since CVA<CVB, Class A has more consistent performance.
Alternatively, comparing standard deviations directly: Class A has σ=8 and Class B has σ=15. The smaller standard deviation of Class A indicates less variability, i.e. more consistency.
A data set is {5,8,10,12,15,18,45}. The value 45 is suspected to be an outlier.
(a) Calculate the mean and standard deviation of all 7 values.
(b) Calculate the mean and standard deviation after removing 45.
(c) Comment on the effect of the outlier.
Solution:
(a) Sum =113, xˉ=7113≈16.14.
∑x2=25+64+100+144+225+324+2025=2907.
σ2=72907−(7113)2=415.29−260.31=154.98.
σ≈12.45.
(b) After removing 45: sum =68, n=6, xˉ=668=334≈11.33.
∑x2=882, σ2=6882−(334)2=147−128.44=18.56.
σ≈4.31.
(c) The outlier 45 significantly increases both the mean (from 11.33 to 16.14) and the standard deviation (from 4.31 to 12.45). The standard deviation is nearly tripled, showing that outliers have an exaggerated effect on measures of dispersion.
WE-7: Using a Given Mean to Find Missing Frequency
Question:
The mean of the following data is 4.5. Find the value of k.
x
1
2
3
4
5
6
f
2
4
k
6
3
1
Solution:
Total frequency: n=2+4+k+6+3+1=16+k.
Sum: ∑fx=2+8+3k+24+15+6=55+3k.
xˉ=16+k55+3k=4.5
55+3k=4.5(16+k)=72+4.5k
1.5k=17
k=1.517=334
Since k must be a non-negative integer, and 334 is not an integer, there is no integer value of k that gives a mean of exactly 4.5. If the question allows non-integer frequencies, k=334.
Confusing population variance with sample variance. The population variance formula divides by n, while the sample variance divides by n−1 (Bessel's correction). In DSE Maths, unless specified otherwise, use the population formula (divide by n).
Forgetting that variance has squared units. If data is in centimetres, the variance is in cm2 and the standard deviation is in cm. Do not mix up units when writing conclusions.
Incorrectly applying coding formulas. For the transformation y=ax+b: new mean =axˉ+b, new SD =∣a∣⋅s. The additive constant b does NOT affect the standard deviation. A common error is writing new SD =as+b.
Using the wrong formula for combined variance. When combining two data sets, do not simply average the variances. Use the correct formula involving the deviation of each set's mean from the combined mean.
Misidentifying quartile positions. Different textbooks use different conventions for finding Q1 and Q3. In DSE, the most common approach is: Q1 is the median of the lower half and Q3 is the median of the upper half.
The following table shows the distribution of marks obtained by 40 students in a test.
Marks
1--10
11--20
21--30
31--40
41--50
Frequency
3
8
14
10
5
(a) Estimate the mean mark. (3 marks)
(b) Estimate the standard deviation of the marks. (3 marks)
(c) If a student scored 35 marks, find the student's standardised score (z-score). (2 marks)
Solution:
(a) Midpoints: 5.5,15.5,25.5,35.5,45.5.
x
f
fx
fx2
5.5
3
16.5
90.75
15.5
8
124
1922
25.5
14
357
9103.5
35.5
10
355
12602.5
45.5
5
227.5
10351.25
Total
40
1080
34070
xˉ=401080=27
(b) σ2=4034070−272=851.75−729=122.75.
σ=122.75≈11.08.
(c) z=11.0835−27=11.088≈0.72.
The student scored 0.72 standard deviations above the mean.
The weights (in kg) of 8 parcels are: 2.3,3.1,4.5,5.2,3.8,4.1,2.9,3.6.
(a) Find the mean and standard deviation. (4 marks)
(b) If each parcel has a label weighing 0.1 kg added, find the new mean and new standard deviation. (2 marks)
(c) If the weight of each parcel is converted to grams, find the new mean and new variance. (2 marks)
The box-and-whisker diagram below summarises the daily temperatures (in °C) recorded in a city for 30 days:
Minimum =12, Q1=18, Median =22, Q3=28, Maximum =35.
(a) Find the interquartile range. (1 mark)
(b) Determine the lower and upper fences and identify any outliers. (3 marks)
(c) What percentage of the data lies between 18 and 28? (1 mark)
Solution:
(a) IQR=Q3−Q1=28−18=10.
(b) Lower fence =Q1−1.5×IQR=18−15=3.
Upper fence =Q3+1.5×IQR=28+15=43.
Since all values (12 to 35) lie within [3,43], there are no outliers.
(c) By definition, 50% of the data lies between Q1 and Q3.