Probability theory provides a rigorous mathematical framework for quantifying uncertainty. In the
DSE compulsory syllabus, we focus on discrete probability spaces, combinatorial counting,
conditional probability, and independence. This page connects to combinatorics)
for counting techniques and dispersion) for the statistical interpretation of
probability distributions.
Notations
| Symbol | Meaning |
|---|
| P(A) | Probability of event A |
| P(A∣B) | Conditional probability of A given B |
| A∩B | Intersection: both A and B occur |
| A∪B | Union: A or B or both occur |
| A′ or Aˉ | Complement of A: A does not occur |
| A⊆B | A is a subset of B |
| ∅ | Empty set (impossible event) |
| S or Ω | Sample space (universal set of all outcomes) |
Basic Probability and Sample Spaces
Definitions
Sample space S (or Ω): the set of all possible outcomes of a random experiment.
Event: a subset of the sample space. Events that consist of exactly one outcome are called
elementary events (or simple events).
Exhaustive events: a collection of events {E1,E2,…,En} is exhaustive if
E1∪E2∪⋯∪En=S. Every possible outcome is covered.
Partition: a collection of events is a partition of S if they are pairwise mutually exclusive
and exhaustive. Each outcome in S belongs to exactly one event in the partition.
Classical (Laplace) Definition
If all elementary outcomes in a finite sample space S are equally likely, then for any event A:
P(A)=∣S∣∣A∣=totalnumberofoutcomesnumberoffavourableoutcomes
This definition reduces probability to a counting problem and directly connects to the techniques in
combinatorics).
Example
A fair six-sided die is rolled. The sample space is S=1,2,3,4,5,6.
- P(even)=∣S∣∣2,4,6∣=63=21.
- P(prime)=∣S∣∣2,3,5∣=63=21.
- P(evenANDprime)=∣S∣∣2∣=61.
Frequentist Interpretation
If an experiment is repeated n times under identical conditions and event A occurs nA times,
then:
P(A)=n→∞limnnA
In practice, nnA is used as an estimate of P(A) for large n. The frequentist
interpretation motivates the axioms: long-run relative frequencies behave consistently with the
rules below.
Axioms of Probability (Kolmogorov)
Any probability measure P defined on a sample space S must satisfy three axioms:
- Non-negativity: P(A)≥0 for every event A⊆S.
- Normalization: P(S)=1.
- Additivity: For any countable collection of pairwise mutually exclusive events
A1,A2,…:
P(i=1⋃∞Ai)=i=1∑∞P(Ai)
These three axioms are the foundation of all probability theory. Every theorem and formula on this
page derives from them.
Fundamental Theorems
Theorem 1. P(∅)=0.
Proof. S and ∅ are mutually exclusive and S∪∅=S. By Axiom 3:
P(S)=P(S∪∅)=P(S)+P(∅)
By Axiom 2, P(S)=1, so 1=1+P(∅), hence P(∅)=0. □
Theorem 2. P(A′)=1−P(A).
Proof. A and A′ are mutually exclusive and A∪A′=S. By Axioms 2 and 3:
1=P(S)=P(A∪A′)=P(A)+P(A′)
Therefore P(A′)=1−P(A). □
Theorem 3. If A⊆B, then P(A)≤P(B).
Proof. Decompose B=A∪(B∩A′). Since A and B∩A′ are mutually exclusive, by
Axiom 3:
P(B)=P(A)+P(B∩A′)
By Axiom 1, P(B∩A′)≥0, so P(B)≥P(A). □
Corollary. 0≤P(A)≤1 for any event A.
Proof. Since ∅⊆A⊆S, Theorem 3 gives
P(∅)≤P(A)≤P(S), i.e., 0≤P(A)≤1. □
Example
Let S=1,2,3,4,5,6 with uniform probability. Let A=1,2 and
B=1,2,3,4.
- A⊆B: confirmed since every element of A is in B.
- P(A)=62=31, P(B)=64=32.
- P(A)≤P(B): 31≤32. ✓
Addition Rule
General Addition Rule
Theorem. For any two events A and B:
P(A∪B)=P(A)+P(B)−P(A∩B)
Proof. Decompose B=(A∩B)∪(B∩A′). These are mutually exclusive, so:
P(B)⟹P(B∩A′)=P(A∩B)+P(B∩A′)=P(B)−P(A∩B)
Now A∪B=A∪(B∩A′), and A and B∩A′ are mutually exclusive:
P(A∪B)=P(A)+P(B∩A′)=P(A)+P(B)−P(A∩B)□
The subtraction of P(A∩B) corrects for double-counting: outcomes in both A and B are
counted once in P(A) and once in P(B), so we subtract one copy.
Mutually Exclusive Events
Two events A and B are mutually exclusive (disjoint) if A∩B=∅, i.e., they
cannot occur simultaneously.
When A∩B=∅, the general addition rule reduces to:
P(A∪B)=P(A)+P(B)
This follows directly because P(A∩B)=P(∅)=0 by Theorem 1.
Example
A card is drawn from a standard 52-card deck.
- Let A = "drawing a king", B = "drawing a queen".
- A∩B=∅ (a card cannot be both a king and a queen).
- P(A∪B)=P(A)+P(B)=524+524=528=132.
Extension to Three Events
P(A∪B∪C)=P(A)+P(B)+P(C)−P(A∩B)−P(A∩C)−P(B∩C)+P(A∩B∩C)
Proof sketch. Apply the two-event rule to P((A∪B)∪C):
P((A∪B)∪C)=P(A∪B)+P(C)−P((A∪B)∩C)=[P(A)+P(B)−P(A∩B)]+P(C)−P((A∩C)∪(B∩C))
Applying the two-event rule to the last term:
P((A∩C)∪(B∩C))=P(A∩C)+P(B∩C)−P(A∩B∩C)
Substituting back and rearranging yields the stated result. □
The pattern continues: for n events, you alternate adding and subtracting terms of each
intersection size. This is the probability analogue of the inclusion-exclusion principle.
Inclusion-Exclusion Principle
The addition rule is the probability counterpart of the inclusion-exclusion principle for counting:
∣A∪B∣=∣A∣+∣B∣−∣A∩B∣
When outcomes are equally likely, dividing both sides by ∣S∣ yields the general addition rule. The
full inclusion-exclusion principle extends to n sets and is covered in
combinatorics).
DSE-style Example
In a class of 40 students, 25 study Physics, 20 study Chemistry, and 8 study both. A student is
chosen at random. Find the probability that the student studies at least one of the two subjects.
Let P = studies Physics, C = studies Chemistry.
P(P∪C)=P(P)+P(C)−P(P∩C)=4025+4020−408=4037The probability that the student studies neither subject:
P((P∪C)′)=1−4037=403
The addition rule generalises naturally. For any number of events, the key insight is: add
all individual probabilities, subtract all pairwise intersections, add back all triple
intersections, and so on, alternating signs.
Conditional Probability
Definition
The conditional probability of A given B is:
P(A∣B)=P(B)P(A∩B),P(B)>0
Intuition: Shrinking the Sample Space
Conditioning on B means we restrict our universe to outcomes where B has already occurred. The
sample space effectively shrinks from S to B. Within this restricted space, the probability of
A is the proportion of B that also belongs to A:
P(A∣B)=∣B∣∣A∩B∣
This is equivalent to P(B)P(A∩B) when all outcomes are equally likely.
Example
A fair die is rolled. Given that the result is even, find the probability that it is greater than 4.
- B = "even" = 2,4,6, so P(B)=63=21.
- A = "greater than 4" = 5,6.
- A∩B=6, so P(A∩B)=61.
- P(A∣B)=1/21/6=31.
Verification by shrinking: within 2,4,6, only 6 is greater than 4, so 31.
✓
Fundamental Properties
Theorem. P(A∣B)=1−P(A′∣B).
Proof. Since A∩B and A′∩B partition B (they are mutually exclusive and their union
is B):
P(B)11=P(A∩B)+P(A′∩B)=P(B)P(A∩B)+P(B)P(A′∩B)=P(A∣B)+P(A′∣B)
Therefore P(A∣B)=1−P(A′∣B). □
This is the conditional analogue of Theorem 2: the conditional probabilities of an event and its
complement must sum to 1 within the conditioned space.
Multiplication Rule
Theorem. P(A∩B)=P(A)⋅P(B∣A)=P(B)⋅P(A∣B).
Proof. From the definition of conditional probability:
P(A∣B)=P(B)P(A∩B)⟹P(A∩B)=P(B)⋅P(A∣B)
By symmetry, swapping A and B:
P(B∣A)=P(A)P(A∩B)⟹P(A∩B)=P(A)⋅P(B∣A)□
For three events, the multiplication rule extends naturally:
P(A∩B∩C)=P(A)⋅P(B∣A)⋅P(C∣A∩B)
This follows by two applications of the two-event rule:
P(A∩B∩C)=P(A∩B)⋅P(C∣A∩B)=P(A)⋅P(B∣A)⋅P(C∣A∩B)
Example
A bag contains 5 red and 3 blue balls. Two balls are drawn without replacement. Find the probability
that both are red.
- P(1stred)=85.
- P(2ndred∣1stred)=74.
- P(bothred)=85×74=5620=145.
With replacement, the second draw is unaffected:
- P(bothred,withreplacement)=85×85=6425.
Independence
Definition
Two events A and B are independent if and only if:
P(A∩B)=P(A)⋅P(B)
Equivalent Characterisation
Theorem. A and B are independent if and only if P(A∣B)=P(A) (provided
P(B)>0).
Proof.
(⇒) If independent, P(A∩B)=P(A)⋅P(B), so:
P(A∣B)=P(B)P(A∩B)=P(B)P(A)⋅P(B)=P(A)
(⇐) If P(A∣B)=P(A), then:
P(B)P(A∩B)=P(A)⟹P(A∩B)=P(A)⋅P(B)□
Independence means that knowing B occurred provides zero information about A. The conditional
probability is unchanged from the unconditional one.
Complement Independence
Theorem. If A and B are independent, then each of the following pairs is also independent:
A and B′, A′ and B, A′ and B′.
Proof. We show A and B′ are independent. The others follow by identical reasoning.
P(A∩B′)=P(A)−P(A∩B)=P(A)−P(A)⋅P(B)=P(A)(1−P(B))=P(A)⋅P(B′)□(sinceA=(A∩B)∪(A∩B′))(byindependence)
Example
Two fair coins are tossed. Let A = "first coin is heads" and B = "second coin is tails".
- P(A)=21, P(B)=21, P(A∩B)=41.
- P(A)⋅P(B)=41=P(A∩B). Independent. ✓
Now A′ = "first coin is tails" and B′ = "second coin is heads":
- P(A′)=21, P(B′)=21,
P(A′∩B′)=P(tails,heads)=41.
- P(A′)⋅P(B′)=41=P(A′∩B′). Independent. ✓
Common Pitfall: Mutually Exclusive = Independent
Theorem. If A and B are mutually exclusive with P(A)>0 and P(B)>0, then A
and B are not independent.
Proof. If A∩B=∅, then P(A∩B)=0. But P(A)⋅P(B)>0 since both
factors are positive. Therefore P(A∩B)=P(A)⋅P(B), so A and B are not
independent. □
Intuition: mutually exclusive events carry strong negative information about each other -- knowing
one occurred guarantees the other did not. Independence means no information transfer at all. These
are opposite extremes.
The only case where mutually exclusive events are also independent is the degenerate case where at
least one event has probability zero.
DSE-style Example
A fair coin is tossed 3 times. Let A = "all three tosses are heads" and B = "the first toss is
tails".
- P(A)=81, P(B)=21.
- A∩B=∅ (cannot have all heads if the first is tails), so P(A∩B)=0.
- Check independence:
P(A)⋅P(B)=81×21=161=0=P(A∩B).
- A and B are mutually exclusive but not independent.
For a valid independence example in the same experiment: let C = "first toss is heads" and D =
"second toss is tails". Then
P(C∩D)=41=21×21=P(C)⋅P(D). C and D are
independent.
When testing independence, always compute both P(A∩B) and P(A)⋅P(B)
separately and compare. Do not assume independence from the problem description -- it must be
verified or explicitly stated.
Bayes' Theorem
Statement
For two events A and B with P(A)>0 and P(B)>0:
P(A∣B)=P(B)P(B∣A)⋅P(A)
Proof
Starting from the definition of conditional probability and the multiplication rule:
P(A∣B)=P(B)P(A∩B)=P(B)P(B∣A)⋅P(A)□
Bayes' theorem "reverses" the conditioning: it expresses P(A∣B) in terms of P(B∣A).
This is the mathematical foundation of statistical inference -- updating beliefs given evidence.
Law of Total Probability
If B1,B2,…,Bn form a partition of S (pairwise mutually exclusive, exhaustive, and
P(Bi)>0 for all i), then for any event A:
P(A)=i=1∑nP(A∣Bi)⋅P(Bi)
Proof. Since the Bi partition S, the events A∩B1,A∩B2,…,A∩Bn are
pairwise mutually exclusive and their union equals A. By Axiom 3:
P(A)=i=1∑nP(A∩Bi)=i=1∑nP(A∣Bi)⋅P(Bi)□
Extended Bayes' Theorem
Combining Bayes' theorem with the law of total probability gives the most useful form. If
B1,…,Bn partition S and A is one of them (say A=Bj), then for any event E with
P(E)>0:
P(Bj∣E)=i=1∑nP(E∣Bi)⋅P(Bi)P(E∣Bj)⋅P(Bj)
The denominator is P(E) computed via the law of total probability.
Example: Medical Testing
A disease affects 1% of a population. A test has:
- Sensitivity (true positive rate): P(positive∣disease)=0.95.
- False positive rate: P(positive∣nodisease)=0.02.
A person tests positive. What is the probability they actually have the disease?
Let D = has disease, D′ = no disease, + = tests positive. The partition is {D,D′}.
By Bayes' theorem:
P(D∣+)=P(+∣D)⋅P(D)+P(+∣D′)⋅P(D′)P(+∣D)⋅P(D)=0.95×0.01+0.02×0.990.95×0.01=0.0095+0.01980.0095=0.02930.0095≈0.324Despite a 95% accurate test, a positive result only means about 32.4% chance of disease. This
counterintuitive result occurs because the disease is rare -- false positives vastly outnumber true
positives in absolute terms.
Example: Quality Control
A factory has three machines producing items. Machine M1 produces 50% of items with 2% defective.
Machine M2 produces 30% with 3% defective. Machine M3 produces 20% with 5% defective. An item
is randomly selected and found to be defective. What is the probability it came from M3?
Let D = defective. The partition is {M1,M2,M3}.
P(M3∣D)=P(D∣M1)⋅P(M1)+P(D∣M2)⋅P(M2)+P(D∣M3)⋅P(M3)P(D∣M3)⋅P(M3)=0.02×0.50+0.03×0.30+0.05×0.200.05×0.20=0.01+0.009+0.010.01=0.0290.01≈0.345Despite M3 having the highest defect rate, it only accounts for about 34.5% of defective items
because it produces the smallest share of total output.
Probability Trees
Construction
A probability tree is a directed graph that decomposes a multi-stage experiment into sequential
branches. Each level of the tree represents a stage, each branch represents an outcome at that
stage, and each branch is labelled with its probability.
Rules:
- The probabilities on branches from each node must sum to 1.
- The probability of any complete path (root to leaf) is the product of all branch probabilities
along that path (multiplication rule).
- The probability of an event is the sum of probabilities of all paths leading to that event
(addition rule for mutually exclusive paths).
Worked Example
Two balls are drawn without replacement from a bag containing 4 red and 2 blue balls.
Tree diagram (text representation)
4/6 (R) -- 3/5 (R) => P = 4/6 x 3/5 = 12/30
/ \ 2/5 (B) => P = 4/6 x 2/5 = 8/30
/
Root -- 4/6 (R)
\
\ 2/6 (B) -- 4/5 (R) => P = 2/6 x 4/5 = 8/30
\ 1/5 (B) => P = 2/6 x 1/5 = 2/30
Verification: all path probabilities sum to 3012+8+8+2=3030=1.
✓
- P(bothred)=3012=52.
- P(bothblue)=302=151.
- P(samecolour)=3012+302=3014=157.
- P(differentcolours)=308+308=3016=158.
Note: P(same)+P(different)=1, as expected since these events are
complements.
Connection to Multiplication and Addition Rules
Probability trees are a visual encoding of the multiplication and addition rules:
- Along a path (sequential stages): multiply probabilities -- this is the multiplication rule.
- Across paths (mutually exclusive ways to reach an event): add probabilities -- this is the
addition rule for mutually exclusive events.
Trees are especially useful for problems involving:
- Sequential draws with or without replacement.
- Multi-step processes where later probabilities depend on earlier outcomes.
- Any scenario requiring the law of total probability (sum over all branches at the final level).
DSE-style Example
A box contains 3 defective and 7 good bulbs. Bulbs are tested one by one without replacement. Find
the probability that the second defective bulb is found on the third test.
The second defective is found on the third test means: exactly one defective in the first two tests,
and the third is defective.
Case 1: Good, then Defective, then Defective:
P=107×93×82=72042=1207Case 2: Defective, then Good, then Defective:
P=103×97×82=72042=1207Total probability (addition rule for mutually exclusive cases):
P=1207+1207=12014=607
Wrap-up Questions
- Question: A fair coin is tossed three times. Find the probability of getting at least two
heads.
Answer
Sample space has 23=8 equally likely outcomes.
At least two heads means 2 or 3 heads:
- 2 heads: HHT,HTH,THH, 3 outcomes.
- 3 heads: HHH, 1 outcome.
P(atleast2heads)=84=21Alternatively, using the binomial formula:
P(atleast2heads)=(23)(21)3+(33)(21)3=83+81=21
- Question: In a group of 50 students, 30 play basketball, 25 play football, and 10 play
neither. A student is chosen at random. Find the probability that the student plays both sports.
Answer
Let B = plays basketball, F = plays football.
- P(B)=5030=53, P(F)=5025=21.
- 10 play neither, so 40 play at least one: P(B∪F)=5040=54.
By the addition rule:
P(B∩F)=P(B)+P(F)−P(B∪F)=53+21−54=106+105−108=103
- Question: A bag contains 4 white and 6 black balls. Two balls are drawn at random without
replacement. Find the probability that they are of different colours.
Answer
Method 1 (direct): white then black, or black then white.
P=104×96+106×94=9024+9024=9048=158Method 2 (complement):
P(different)P(bothwhite)P(bothblack)P(different)=1−P(same)=1−P(bothwhite)−P(bothblack)=104×93=9012=106×95=9030=1−9012−9030=9048=158✓
- Question: Events A and B are such that P(A)=0.6, P(B)=0.5, and
P(A∣B)=0.4. Find P(A∪B).
Answer
From P(A∣B)=P(B)P(A∩B):
P(A∩B)=P(A∣B)⋅P(B)=0.4×0.5=0.2By the addition rule:
P(A∪B)=P(A)+P(B)−P(A∩B)=0.6+0.5−0.2=0.9
- Question: A and B are independent events with P(A)=0.3 and P(B)=0.5. Find
P(A′∩B′).
Answer
By the complement independence theorem, since A and B are independent, A′ and B′ are also
independent:
P(A′∩B′)=P(A′)⋅P(B′)=(1−0.3)(1−0.5)=0.7×0.5=0.35Verification via complement: P(A′∩B′)=P((A∪B)′)=1−P(A∪B).
P(A∪B)P(A′∩B′)=P(A)+P(B)−P(A∩B)=0.3+0.5−0.15=0.65=1−0.65=0.35✓
- Question: Two events A and B satisfy P(A)=31, P(B)=41, and
P(A∪B)=125. Determine whether A and B are independent.
Answer
By the addition rule:
P(A∩B)=P(A)+P(B)−P(A∪B)=31+41−125=124+123−125=122=61Check independence:
P(A)⋅P(B)=31×41=121=61=P(A∩B).
Since P(A∩B)=P(A)⋅P(B), the events are not independent.
- Question: A box contains 5 red, 3 green, and 2 blue marbles. Three marbles are drawn without
replacement. Find the probability that all three are the same colour.
Answer
The three colours are mutually exclusive cases, so by the addition rule:
P(allred)P(allgreen)P(allblue)=105×94×83=72060=121=103×92×81=7206=1201=102×91×80=0P(allsamecolour)=121+1201+0=12010+1201=12011
- Question: In a certain school, 60% of students take Mathematics, 40% take Physics, and 30%
take both. A student is selected at random. Given that the student takes Mathematics, what is the
probability that they also take Physics?
Answer
P(Physics∣Maths)=P(Maths)P(Physics∩Maths)=0.600.30=0.5Half of Mathematics students also take Physics.
- Question: A factory produces items using Machine X (60% of output) and Machine Y (40% of
output). The defect rates are 3% for X and 7% for Y. An item is found to be defective. Use
Bayes' theorem to find the probability it was produced by Machine X.
Answer
Let D = defective. The partition is {X,Y}.
P(X∣D)=P(D∣X)⋅P(X)+P(D∣Y)⋅P(Y)P(D∣X)⋅P(X)=0.03×0.60+0.07×0.400.03×0.60=0.018+0.0280.018=0.0460.018≈0.391Despite producing 60% of items, Machine X accounts for only about 39.1% of defective items because
its defect rate is lower.
- Question: A fair die is rolled twice. Find the probability that the sum of the two results
is 8, given that the first result is at least 3.
Answer
Let A = "sum is 8", B = "first result ≥ 3".
Sample space: 6×6=36 equally likely outcomes.
∣B∣=4×6=24 (first die shows 3, 4, 5, or 6).
A∩B = outcomes with first ≥ 3 and sum 8: (3,5),(4,4),(5,3),(6,2), so
∣A∩B∣=4.
P(A∣B)=∣B∣∣A∩B∣=244=61For comparison, the unconditional probability: P(A)=365 (pairs
(2,6),(3,5),(4,4),(5,3),(6,2)). Conditioning on the first die being ≥ 3 eliminates
(2,6), reducing the count from 5 to 4.
- Question: A, B, and C are three events with P(A)=P(B)=P(C)=31,
P(A∩B)=P(A∩C)=P(B∩C)=61, and P(A∩B∩C)=121.
Find P(A∪B∪C).
Answer
Using the three-event addition rule:
P(A∪B∪C)=P(A)+P(B)+P(C)−P(A∩B)−P(A∩C)−P(B∩C)+P(A∩B∩C)=31+31+31−61−61−61+121=1−21+121=1212−126+121=127
- Question: A test for a condition has a sensitivity of 90% and a specificity of 95%. The
condition prevalence in the population is 1%. Find the positive predictive value
P(condition∣positive).
Answer
- Sensitivity: P(+∣C)=0.90.
- Specificity: P(−∣C′)=0.95, so P(+∣C′)=1−0.95=0.05.
- Prevalence: P(C)=0.01, P(C′)=0.99.
By Bayes' theorem:
P(C∣+)=P(+∣C)⋅P(C)+P(+∣C′)⋅P(C′)P(+∣C)⋅P(C)=0.90×0.01+0.05×0.990.90×0.01=0.009+0.04950.009=0.05850.009≈0.154A positive result means only about 15.4% chance of actually having the condition. This is the base
rate fallacy in action: low prevalence swamps even a good test's signal.
For the A-Level treatment of this topic, see Probability.
Diagnostic Test
Ready to test your understanding of Probability? The diagnostic test contains the hardest questions within the DSE specification for this topic, each with a full worked solution.
Unit tests probe edge cases and common misconceptions. Integration tests combine Probability with other DSE mathematics topics to test synthesis under exam conditions.
See Diagnostic Guide for instructions on self-marking and building a personal test matrix.