Skip to main content

Dispersion — Diagnostic Tests

Unit Tests

Tests edge cases, boundary conditions, and common misconceptions for dispersion.

UT-1: Variance vs Standard Deviation Units

Question:

The heights of 100 students have mean 165165 cm and variance 4949 cm2^2. A new student of height 181181 cm joins the group. Find the new mean and new variance.

Solution:

Original: n=100n = 100, xˉ=165\bar{x} = 165, σ2=49\sigma^2 = 49.

Sum of original data: xi=100×165=16500\sum x_i = 100 \times 165 = 16500.

Sum of squares of original data: xi2=n(σ2+xˉ2)=100(49+27225)=100×27274=2727400\sum x_i^2 = n(\sigma^2 + \bar{x}^2) = 100(49 + 27225) = 100 \times 27274 = 2727400.

After adding 181:

New sum: 16500+181=1668116500 + 181 = 16681.

New n=101n = 101.

New mean: xˉnew=16681101165.16\bar{x}_{\text{new}} = \dfrac{16681}{101} \approx 165.16.

New sum of squares: 2727400+1812=2727400+32761=27601612727400 + 181^2 = 2727400 + 32761 = 2760161.

New variance: σnew2=2760161101(165.16)227328.3327277.8350.50\sigma^2_{\text{new}} = \dfrac{2760161}{101} - (165.16)^2 \approx 27328.33 - 27277.83 \approx 50.50 cm2^2.

Note: variance has units cm2^2, while standard deviation has units cm.


UT-2: Coding Effect on Mean and Standard Deviation

Question:

A set of data has mean mm and standard deviation ss. If each value is transformed by y=3x5y = 3x - 5, find the new mean and new standard deviation in terms of mm and ss.

Solution:

For the transformation y=ax+by = ax + b:

  • New mean =a×old mean+b=3m5= a \times \text{old mean} + b = 3m - 5.
  • New standard deviation =a×old standard deviation=3s= |a| \times \text{old standard deviation} = 3s.
  • New variance =a2×old variance=9s2= a^2 \times \text{old variance} = 9s^2.

The additive constant 5-5 affects the mean but NOT the standard deviation.

A common mistake is thinking the 5-5 affects the standard deviation.


UT-3: Grouped Data Midpoints

Question:

Find the estimated mean and estimated standard deviation from the following grouped data:

ClassFrequency
0x<100 \leq x \lt 105
10x<2010 \leq x \lt 208
20x<3020 \leq x \lt 3012
30x<4030 \leq x \lt 405

Solution:

Midpoints: 5,15,25,355, 15, 25, 35.

xxfffxfxfx2fx^2
5525125
1581201800
25123007500
3551756125
Total3062015550

Estimated mean: xˉ=62030=62320.67\bar{x} = \dfrac{620}{30} = \dfrac{62}{3} \approx 20.67.

Estimated variance: 1555030(623)2=1555338449=466538449=821991.22\dfrac{15550}{30} - \left(\dfrac{62}{3}\right)^2 = \dfrac{1555}{3} - \dfrac{3844}{9} = \dfrac{4665 - 3844}{9} = \dfrac{821}{9} \approx 91.22.

Estimated SD =8219=82139.56= \sqrt{\dfrac{821}{9}} = \dfrac{\sqrt{821}}{3} \approx 9.56.


UT-4: Box Plot Interpretation

Question:

A box plot has minimum =10= 10, Q1=25Q_1 = 25, median =35= 35, Q3=50Q_3 = 50, maximum =80= 80. Find the interquartile range and identify any outliers.

Solution:

IQR=Q3Q1=5025=25\text{IQR} = Q_3 - Q_1 = 50 - 25 = 25.

Lower fence: Q11.5×IQR=2537.5=12.5Q_1 - 1.5 \times \text{IQR} = 25 - 37.5 = -12.5.

Upper fence: Q3+1.5×IQR=50+37.5=87.5Q_3 + 1.5 \times \text{IQR} = 50 + 37.5 = 87.5.

Since all values (1010 to 8080) fall within [12.5,  87.5][-12.5,\; 87.5], there are no outliers.


UT-5: Combined Data Sets

Question:

Set AA has nA=8n_A = 8, xˉA=10\bar{x}_A = 10, sA2=4s_A^2 = 4. Set BB has nB=12n_B = 12, xˉB=15\bar{x}_B = 15, sB2=9s_B^2 = 9. Find the mean and variance of the combined set ABA \cup B.

Solution:

Combined n=8+12=20n = 8 + 12 = 20.

Combined mean: xˉ=8×10+12×1520=80+18020=26020=13\bar{x} = \dfrac{8 \times 10 + 12 \times 15}{20} = \dfrac{80 + 180}{20} = \dfrac{260}{20} = 13.

Combined variance using the formula:

s2=nA(sA2+dA2)+nB(sB2+dB2)nA+nBs^2 = \frac{n_A(s_A^2 + d_A^2) + n_B(s_B^2 + d_B^2)}{n_A + n_B}

where dA=xˉAxˉ=1013=3d_A = \bar{x}_A - \bar{x} = 10 - 13 = -3 and dB=xˉBxˉ=1513=2d_B = \bar{x}_B - \bar{x} = 15 - 13 = 2.

s2=8(4+9)+12(9+4)20=8×13+12×1320=104+15620=26020=13s^2 = \frac{8(4 + 9) + 12(9 + 4)}{20} = \frac{8 \times 13 + 12 \times 13}{20} = \frac{104 + 156}{20} = \frac{260}{20} = 13


Integration Tests

Tests synthesis of dispersion with other topics.

IT-1: Dispersion and Probability (with Probability)

Question:

A random variable XX takes values 1,2,3,4,51, 2, 3, 4, 5 with probabilities 115\dfrac{1}{15}, 215\dfrac{2}{15}, 315\dfrac{3}{15}, 415\dfrac{4}{15}, 515\dfrac{5}{15} respectively. Find E(X)E(X) and Var(X)\text{Var}(X).

Solution:

E(X)=1×115+2×215+3×315+4×415+5×515E(X) = 1 \times \frac{1}{15} + 2 \times \frac{2}{15} + 3 \times \frac{3}{15} + 4 \times \frac{4}{15} + 5 \times \frac{5}{15}

=1+4+9+16+2515=5515=113= \frac{1 + 4 + 9 + 16 + 25}{15} = \frac{55}{15} = \frac{11}{3}

E(X2)=1+8+27+64+12515=22515=15E(X^2) = \frac{1 + 8 + 27 + 64 + 125}{15} = \frac{225}{15} = 15

Var(X)=E(X2)[E(X)]2=151219=1351219=149\text{Var}(X) = E(X^2) - [E(X)]^2 = 15 - \frac{121}{9} = \frac{135 - 121}{9} = \frac{14}{9}


IT-2: Dispersion and Inequalities (with Inequalities)

Question:

A set of 50 numbers has mean 2020 and standard deviation 44. Using Chebyshev's inequality (or empirical reasoning), at least what percentage of the data lies within 2 standard deviations of the mean?

Solution:

Within 2 standard deviations: 20±2(4)=[12,  28]20 \pm 2(4) = [12,\; 28].

By Chebyshev's inequality, at least 11k2=114=34=75%1 - \dfrac{1}{k^2} = 1 - \dfrac{1}{4} = \dfrac{3}{4} = 75\% of data lies within k=2k = 2 standard deviations.

If the data is approximately normally distributed, the empirical rule gives approximately 95%95\%, but Chebyshev gives the guaranteed minimum of 75%75\%.


IT-3: Dispersion and Combinatorics (with Combinatorics)

Question:

All possible samples of size 2 are drawn with replacement from the population 2,4,6\\{2, 4, 6\\}. Find the mean and variance of the sampling distribution of the sample mean.

Solution:

Population: 2,4,6\\{2, 4, 6\\}, μ=4\mu = 4, σ2=(4+0+4)3=83\sigma^2 = \dfrac{(4+0+4)}{3} = \dfrac{8}{3}.

All samples of size 2 with replacement (9 samples):

(2,2),(2,4),(2,6),(4,2),(4,4),(4,6),(6,2),(6,4),(6,6)(2,2), (2,4), (2,6), (4,2), (4,4), (4,6), (6,2), (6,4), (6,6).

Sample means: 2,3,4,3,4,5,4,5,62, 3, 4, 3, 4, 5, 4, 5, 6.

Mean of sample means: 2+3+4+3+4+5+4+5+69=369=4=μ\dfrac{2+3+4+3+4+5+4+5+6}{9} = \dfrac{36}{9} = 4 = \mu.

Variance of sample means: (24)2+2(34)2+3(44)2+2(54)2+(64)29\dfrac{(2-4)^2 + 2(3-4)^2 + 3(4-4)^2 + 2(5-4)^2 + (6-4)^2}{9}

=4+2+0+2+49=129=43=σ2n= \dfrac{4 + 2 + 0 + 2 + 4}{9} = \dfrac{12}{9} = \dfrac{4}{3} = \dfrac{\sigma^2}{n}.

This confirms: E(Xˉ)=μE(\bar{X}) = \mu and Var(Xˉ)=σ2n\text{Var}(\bar{X}) = \dfrac{\sigma^2}{n}.


Worked Examples

WE-1: Effect of Adding a Constant to All Data

Question:

A set of 5 numbers has mean 1010 and standard deviation 33. If 77 is added to each number, find the new mean, new standard deviation, and new variance.

Solution:

Transformation: y=x+7y = x + 7 (i.e. a=1a = 1, b=7b = 7).

New mean =1×10+7=17= 1 \times 10 + 7 = 17.

New standard deviation =1×3=3= |1| \times 3 = 3.

New variance =12×9=9= 1^2 \times 9 = 9.

The additive constant 77 shifts the data but does not affect the spread.


WE-2: Standard Deviation from a Frequency Distribution

Question:

Find the mean and standard deviation of the following data:

xx246810
ff35842

Solution:

xxfffxfxfx2fx^2
23612
452080
6848288
8432256
10220200
Total22126836

xˉ=12622=63115.727\bar{x} = \frac{126}{22} = \frac{63}{11} \approx 5.727

σ2=83622(6311)2=418113969121=45983969121=629121\sigma^2 = \frac{836}{22} - \left(\frac{63}{11}\right)^2 = \frac{418}{11} - \frac{3969}{121} = \frac{4598 - 3969}{121} = \frac{629}{121}

σ=629121=629112.28\sigma = \sqrt{\frac{629}{121}} = \frac{\sqrt{629}}{11} \approx 2.28

DSE Exam Technique: Always set up the table with columns for fxfx and fx2fx^2. This earns method marks even if there is a minor calculation error. Leave standard deviation in exact form unless asked for a decimal.


WE-3: Coding with Grouped Data

Question:

Using the coding y=x255y = \dfrac{x - 25}{5}, the coded data has mean yˉ=1.2\bar{y} = 1.2 and variance sy2=4.8s_y^2 = 4.8. Find the mean and standard deviation of the original data.

Solution:

The coding is y=x255=15x5y = \dfrac{x - 25}{5} = \dfrac{1}{5}x - 5, so a=15a = \dfrac{1}{5} and b=5b = -5.

Original mean: xˉ=yˉba=1.2+51/5=6.2×5=31\bar{x} = \dfrac{\bar{y} - b}{a} = \dfrac{1.2 + 5}{1/5} = 6.2 \times 5 = 31.

Original standard deviation: σx=σya=4.81/5=54.8=5×21.2=101.2=23010.95\sigma_x = \dfrac{\sigma_y}{|a|} = \dfrac{\sqrt{4.8}}{1/5} = 5\sqrt{4.8} = 5 \times 2\sqrt{1.2} = 10\sqrt{1.2} = 2\sqrt{30} \approx 10.95.


WE-4: Interquartile Range from Raw Data

Question:

Find the median, quartiles, and interquartile range of the data set:

12,15,18,20,22,25,28,30,35,40,4512, 15, 18, 20, 22, 25, 28, 30, 35, 40, 45

Solution:

There are n=11n = 11 data values (odd).

Median position: 11+12=6\dfrac{11 + 1}{2} = 6. Median =25= 25.

Lower half: 12,15,18,20,2212, 15, 18, 20, 22 (5 values).

Q1Q_1 position: 5+12=3\dfrac{5 + 1}{2} = 3. Q1=18Q_1 = 18.

Upper half: 28,30,35,40,4528, 30, 35, 40, 45 (5 values).

Q3Q_3 position: 5+12=3\dfrac{5 + 1}{2} = 3. Q3=35Q_3 = 35.

IQR=Q3Q1=3518=17\text{IQR} = Q_3 - Q_1 = 35 - 18 = 17


WE-5: Comparing Distributions Using Standard Deviation

Question:

Class A has test scores with mean 6565 and standard deviation 88. Class B has test scores with mean 6868 and standard deviation 1515. Which class has more consistent performance? Justify your answer.

Solution:

The coefficient of variation (CV) measures relative dispersion:

CVA=865×100%12.3%\text{CV}_A = \frac{8}{65} \times 100\% \approx 12.3\%

CVB=1568×100%22.1%\text{CV}_B = \frac{15}{68} \times 100\% \approx 22.1\%

Since CVA<CVB\text{CV}_A < \text{CV}_B, Class A has more consistent performance.

Alternatively, comparing standard deviations directly: Class A has σ=8\sigma = 8 and Class B has σ=15\sigma = 15. The smaller standard deviation of Class A indicates less variability, i.e. more consistency.


WE-6: Removing an Outlier

Question:

A data set is {5,8,10,12,15,18,45}\{5, 8, 10, 12, 15, 18, 45\}. The value 4545 is suspected to be an outlier.

(a) Calculate the mean and standard deviation of all 7 values. (b) Calculate the mean and standard deviation after removing 4545. (c) Comment on the effect of the outlier.

Solution:

(a) Sum =113= 113, xˉ=113716.14\bar{x} = \dfrac{113}{7} \approx 16.14.

x2=25+64+100+144+225+324+2025=2907\sum x^2 = 25 + 64 + 100 + 144 + 225 + 324 + 2025 = 2907.

σ2=29077(1137)2=415.29260.31=154.98\sigma^2 = \dfrac{2907}{7} - \left(\dfrac{113}{7}\right)^2 = 415.29 - 260.31 = 154.98.

σ12.45\sigma \approx 12.45.

(b) After removing 4545: sum =68= 68, n=6n = 6, xˉ=686=34311.33\bar{x} = \dfrac{68}{6} = \dfrac{34}{3} \approx 11.33.

x2=882\sum x^2 = 882, σ2=8826(343)2=147128.44=18.56\sigma^2 = \dfrac{882}{6} - \left(\dfrac{34}{3}\right)^2 = 147 - 128.44 = 18.56.

σ4.31\sigma \approx 4.31.

(c) The outlier 4545 significantly increases both the mean (from 11.3311.33 to 16.1416.14) and the standard deviation (from 4.314.31 to 12.4512.45). The standard deviation is nearly tripled, showing that outliers have an exaggerated effect on measures of dispersion.


WE-7: Using a Given Mean to Find Missing Frequency

Question:

The mean of the following data is 4.54.5. Find the value of kk.

xx123456
ff24kk631

Solution:

Total frequency: n=2+4+k+6+3+1=16+kn = 2 + 4 + k + 6 + 3 + 1 = 16 + k.

Sum: fx=2+8+3k+24+15+6=55+3k\sum fx = 2 + 8 + 3k + 24 + 15 + 6 = 55 + 3k.

xˉ=55+3k16+k=4.5\bar{x} = \frac{55 + 3k}{16 + k} = 4.5

55+3k=4.5(16+k)=72+4.5k55 + 3k = 4.5(16 + k) = 72 + 4.5k

1.5k=171.5k = 17

k=171.5=343k = \frac{17}{1.5} = \frac{34}{3}

Since kk must be a non-negative integer, and 343\dfrac{34}{3} is not an integer, there is no integer value of kk that gives a mean of exactly 4.54.5. If the question allows non-integer frequencies, k=343k = \dfrac{34}{3}.


WE-8: Sheppard's Correction (Awareness)

Question:

For grouped data with class width hh, state Sheppard's correction for the variance and explain when it is appropriate to use it.

Solution:

Sheppard's correction adjusts the grouped data variance:

σcorrected2=σgrouped2h212\sigma_{\text{corrected}}^2 = \sigma_{\text{grouped}}^2 - \frac{h^2}{12}

This correction is appropriate when:

  • The data is approximately normally distributed.
  • The distribution is continuous.
  • The class intervals are of equal width.

It accounts for the fact that using midpoints assumes data is concentrated at the centre of each class, which slightly overestimates the variance.

In DSE examinations, Sheppard's correction is generally not required unless explicitly asked.


Common Pitfalls

  1. Confusing population variance with sample variance. The population variance formula divides by nn, while the sample variance divides by n1n - 1 (Bessel's correction). In DSE Maths, unless specified otherwise, use the population formula (divide by nn).

  2. Forgetting that variance has squared units. If data is in centimetres, the variance is in cm2^2 and the standard deviation is in cm. Do not mix up units when writing conclusions.

  3. Incorrectly applying coding formulas. For the transformation y=ax+by = ax + b: new mean =axˉ+b= a\bar{x} + b, new SD =as= |a| \cdot s. The additive constant bb does NOT affect the standard deviation. A common error is writing new SD =as+b= as + b.

  4. Using the wrong formula for combined variance. When combining two data sets, do not simply average the variances. Use the correct formula involving the deviation of each set's mean from the combined mean.

  5. Misidentifying quartile positions. Different textbooks use different conventions for finding Q1Q_1 and Q3Q_3. In DSE, the most common approach is: Q1Q_1 is the median of the lower half and Q3Q_3 is the median of the upper half.


DSE Exam-Style Questions

DSE-1

The following table shows the distribution of marks obtained by 40 students in a test.

Marks11--10101111--20202121--30303131--40404141--5050
Frequency3814105

(a) Estimate the mean mark. (3 marks) (b) Estimate the standard deviation of the marks. (3 marks) (c) If a student scored 35 marks, find the student's standardised score (z-score). (2 marks)

Solution:

(a) Midpoints: 5.5,15.5,25.5,35.5,45.55.5, 15.5, 25.5, 35.5, 45.5.

xxfffxfxfx2fx^2
5.5316.590.75
15.581241922
25.5143579103.5
35.51035512602.5
45.55227.510351.25
Total40108034070

xˉ=108040=27\bar{x} = \frac{1080}{40} = 27

(b) σ2=3407040272=851.75729=122.75\sigma^2 = \dfrac{34070}{40} - 27^2 = 851.75 - 729 = 122.75.

σ=122.7511.08\sigma = \sqrt{122.75} \approx 11.08.

(c) z=352711.08=811.080.72z = \dfrac{35 - 27}{11.08} = \dfrac{8}{11.08} \approx 0.72.

The student scored 0.720.72 standard deviations above the mean.


DSE-2

The weights (in kg) of 8 parcels are: 2.3,3.1,4.5,5.2,3.8,4.1,2.9,3.62.3, 3.1, 4.5, 5.2, 3.8, 4.1, 2.9, 3.6.

(a) Find the mean and standard deviation. (4 marks) (b) If each parcel has a label weighing 0.10.1 kg added, find the new mean and new standard deviation. (2 marks) (c) If the weight of each parcel is converted to grams, find the new mean and new variance. (2 marks)

Solution:

(a) Sum =29.5= 29.5, xˉ=29.58=3.6875\bar{x} = \dfrac{29.5}{8} = 3.6875.

x2=5.29+9.61+20.25+27.04+14.44+16.81+8.41+12.96=114.81\sum x^2 = 5.29 + 9.61 + 20.25 + 27.04 + 14.44 + 16.81 + 8.41 + 12.96 = 114.81.

σ2=114.818(3.6875)2=14.3512513.59766=0.7536\sigma^2 = \dfrac{114.81}{8} - (3.6875)^2 = 14.35125 - 13.59766 = 0.7536.

σ=0.75360.868\sigma = \sqrt{0.7536} \approx 0.868 kg.

(b) Transformation: y=x+0.1y = x + 0.1.

New mean =3.6875+0.1=3.7875= 3.6875 + 0.1 = 3.7875 kg.

New SD =0.868= 0.868 kg (unchanged by addition).

(c) Transformation: y=1000xy = 1000x.

New mean =1000×3.6875=3687.5= 1000 \times 3.6875 = 3687.5 g.

New variance =10002×0.7536=753600= 1000^2 \times 0.7536 = 753600 g2^2.


DSE-3

Two classes took the same examination. Class AA (30 students) had mean 7272 and variance 3636. Class BB (20 students) had mean 6565 and variance 6464.

(a) Find the overall mean of all 50 students. (2 marks) (b) Find the overall variance of all 50 students. (4 marks)

Solution:

(a) Combined mean =30×72+20×6550=2160+130050=346050=69.2= \dfrac{30 \times 72 + 20 \times 65}{50} = \dfrac{2160 + 1300}{50} = \dfrac{3460}{50} = 69.2.

(b) dA=7269.2=2.8d_A = 72 - 69.2 = 2.8, dB=6569.2=4.2d_B = 65 - 69.2 = -4.2.

σ2=30(36+2.82)+20(64+4.22)50=30(36+7.84)+20(64+17.64)50\sigma^2 = \frac{30(36 + 2.8^2) + 20(64 + 4.2^2)}{50} = \frac{30(36 + 7.84) + 20(64 + 17.64)}{50}

=30×43.84+20×81.6450=1315.2+1632.850=294850=58.96= \frac{30 \times 43.84 + 20 \times 81.64}{50} = \frac{1315.2 + 1632.8}{50} = \frac{2948}{50} = 58.96


DSE-4

The box-and-whisker diagram below summarises the daily temperatures (in °°C) recorded in a city for 30 days:

Minimum =12= 12, Q1=18Q_1 = 18, Median =22= 22, Q3=28Q_3 = 28, Maximum =35= 35.

(a) Find the interquartile range. (1 mark) (b) Determine the lower and upper fences and identify any outliers. (3 marks) (c) What percentage of the data lies between 1818 and 2828? (1 mark)

Solution:

(a) IQR=Q3Q1=2818=10\text{IQR} = Q_3 - Q_1 = 28 - 18 = 10.

(b) Lower fence =Q11.5×IQR=1815=3= Q_1 - 1.5 \times \text{IQR} = 18 - 15 = 3.

Upper fence =Q3+1.5×IQR=28+15=43= Q_3 + 1.5 \times \text{IQR} = 28 + 15 = 43.

Since all values (1212 to 3535) lie within [3,43][3, 43], there are no outliers.

(c) By definition, 50%50\% of the data lies between Q1Q_1 and Q3Q_3.


DSE-5

A set of data x1,x2,,xnx_1, x_2, \ldots, x_n has mean μ\mu and standard deviation σ\sigma. A new data set is formed by removing the value μ\mu from the original set.

(a) Find the mean of the new data set. (2 marks) (b) Find the variance of the new data set in terms of σ\sigma and nn. (4 marks)

Solution:

(a) Original sum =nμ= n\mu. After removing μ\mu: new sum =(n1)μ= (n-1)\mu.

New mean =(n1)μn1=μ= \dfrac{(n-1)\mu}{n-1} = \mu.

(b) Original xi2=n(σ2+μ2)\sum x_i^2 = n(\sigma^2 + \mu^2).

After removing μ\mu: xi2=n(σ2+μ2)μ2=nσ2+(n1)μ2\sum x_i^2 = n(\sigma^2 + \mu^2) - \mu^2 = n\sigma^2 + (n-1)\mu^2.

New variance =nσ2+(n1)μ2n1μ2=nσ2n1= \dfrac{n\sigma^2 + (n-1)\mu^2}{n-1} - \mu^2 = \dfrac{n\sigma^2}{n-1}.