Skip to main content

Probability

Probability theory provides a rigorous mathematical framework for quantifying uncertainty. In the DSE compulsory syllabus, we focus on discrete probability spaces, combinatorial counting, conditional probability, and independence. This page connects to combinatorics) for counting techniques and dispersion) for the statistical interpretation of probability distributions.

Notations

SymbolMeaning
P(A)P(A)Probability of event AA
P(AB)P(A \mid B)Conditional probability of AA given BB
ABA \cap BIntersection: both AA and BB occur
ABA \cup BUnion: AA or BB or both occur
AA' or Aˉ\bar{A}Complement of AA: AA does not occur
ABA \subseteq BAA is a subset of BB
\emptysetEmpty set (impossible event)
SS or Ω\OmegaSample space (universal set of all outcomes)

Basic Probability and Sample Spaces

Definitions

Sample space SS (or Ω\Omega): the set of all possible outcomes of a random experiment.

Event: a subset of the sample space. Events that consist of exactly one outcome are called elementary events (or simple events).

Exhaustive events: a collection of events {E1,E2,,En}\{E_1, E_2, \ldots, E_n\} is exhaustive if E1E2En=SE_1 \cup E_2 \cup \cdots \cup E_n = S. Every possible outcome is covered.

Partition: a collection of events is a partition of SS if they are pairwise mutually exclusive and exhaustive. Each outcome in SS belongs to exactly one event in the partition.

Classical (Laplace) Definition

If all elementary outcomes in a finite sample space SS are equally likely, then for any event AA:

P(A)=AS=numberoffavourableoutcomestotalnumberofoutcomes\begin{aligned} P(A) = \frac{|A|}{|S|} = \frac{\mathrm{number of favourable outcomes}}{\mathrm{total number of outcomes}} \end{aligned}

This definition reduces probability to a counting problem and directly connects to the techniques in combinatorics).

Example

A fair six-sided die is rolled. The sample space is S=1,2,3,4,5,6S = \\{1, 2, 3, 4, 5, 6\\}.

  • P(even)=2,4,6S=36=12P(\mathrm{even}) = \frac{|\\{2, 4, 6\\}|}{|S|} = \frac{3}{6} = \frac{1}{2}.
  • P(prime)=2,3,5S=36=12P(\mathrm{prime}) = \frac{|\\{2, 3, 5\\}|}{|S|} = \frac{3}{6} = \frac{1}{2}.
  • P(evenANDprime)=2S=16P(\mathrm{even AND prime}) = \frac{|\\{2\\}|}{|S|} = \frac{1}{6}.

Frequentist Interpretation

If an experiment is repeated nn times under identical conditions and event AA occurs nAn_A times, then:

P(A)=limnnAn\begin{aligned} P(A) = \lim_{n \to \infty} \frac{n_A}{n} \end{aligned}

In practice, nAn\frac{n_A}{n} is used as an estimate of P(A)P(A) for large nn. The frequentist interpretation motivates the axioms: long-run relative frequencies behave consistently with the rules below.

Axioms of Probability (Kolmogorov)

Any probability measure PP defined on a sample space SS must satisfy three axioms:

  1. Non-negativity: P(A)0P(A) \geq 0 for every event ASA \subseteq S.
  2. Normalization: P(S)=1P(S) = 1.
  3. Additivity: For any countable collection of pairwise mutually exclusive events A1,A2,A_1, A_2, \ldots:
P(i=1Ai)=i=1P(Ai)\begin{aligned} P\left(\bigcup_{i=1}^{\infty} A_i\right) = \sum_{i=1}^{\infty} P(A_i) \end{aligned}

These three axioms are the foundation of all probability theory. Every theorem and formula on this page derives from them.

Fundamental Theorems

Theorem 1. P()=0P(\emptyset) = 0.

Proof. SS and \emptyset are mutually exclusive and S=SS \cup \emptyset = S. By Axiom 3:

P(S)=P(S)=P(S)+P()\begin{aligned} P(S) = P(S \cup \emptyset) = P(S) + P(\emptyset) \end{aligned}

By Axiom 2, P(S)=1P(S) = 1, so 1=1+P()1 = 1 + P(\emptyset), hence P()=0P(\emptyset) = 0. \square

Theorem 2. P(A)=1P(A)P(A') = 1 - P(A).

Proof. AA and AA' are mutually exclusive and AA=SA \cup A' = S. By Axioms 2 and 3:

1=P(S)=P(AA)=P(A)+P(A)\begin{aligned} 1 = P(S) = P(A \cup A') = P(A) + P(A') \end{aligned}

Therefore P(A)=1P(A)P(A') = 1 - P(A). \square

Theorem 3. If ABA \subseteq B, then P(A)P(B)P(A) \leq P(B).

Proof. Decompose B=A(BA)B = A \cup (B \cap A'). Since AA and BAB \cap A' are mutually exclusive, by Axiom 3:

P(B)=P(A)+P(BA)\begin{aligned} P(B) = P(A) + P(B \cap A') \end{aligned}

By Axiom 1, P(BA)0P(B \cap A') \geq 0, so P(B)P(A)P(B) \geq P(A). \square

Corollary. 0P(A)10 \leq P(A) \leq 1 for any event AA.

Proof. Since AS\emptyset \subseteq A \subseteq S, Theorem 3 gives P()P(A)P(S)P(\emptyset) \leq P(A) \leq P(S), i.e., 0P(A)10 \leq P(A) \leq 1. \square

Example

Let S=1,2,3,4,5,6S = \\{1, 2, 3, 4, 5, 6\\} with uniform probability. Let A=1,2A = \\{1, 2\\} and B=1,2,3,4B = \\{1, 2, 3, 4\\}.

  • ABA \subseteq B: confirmed since every element of AA is in BB.
  • P(A)=26=13P(A) = \frac{2}{6} = \frac{1}{3}, P(B)=46=23P(B) = \frac{4}{6} = \frac{2}{3}.
  • P(A)P(B)P(A) \leq P(B): 1323\frac{1}{3} \leq \frac{2}{3}. \checkmark

Addition Rule

General Addition Rule

Theorem. For any two events AA and BB:

P(AB)=P(A)+P(B)P(AB)\begin{aligned} P(A \cup B) = P(A) + P(B) - P(A \cap B) \end{aligned}

Proof. Decompose B=(AB)(BA)B = (A \cap B) \cup (B \cap A'). These are mutually exclusive, so:

P(B)=P(AB)+P(BA)    P(BA)=P(B)P(AB)\begin{aligned} P(B) &= P(A \cap B) + P(B \cap A') \\ \implies P(B \cap A') &= P(B) - P(A \cap B) \end{aligned}

Now AB=A(BA)A \cup B = A \cup (B \cap A'), and AA and BAB \cap A' are mutually exclusive:

P(AB)=P(A)+P(BA)=P(A)+P(B)P(AB)\begin{aligned} P(A \cup B) &= P(A) + P(B \cap A') \\ &= P(A) + P(B) - P(A \cap B) \quad \square \end{aligned}

The subtraction of P(AB)P(A \cap B) corrects for double-counting: outcomes in both AA and BB are counted once in P(A)P(A) and once in P(B)P(B), so we subtract one copy.

Mutually Exclusive Events

Two events AA and BB are mutually exclusive (disjoint) if AB=A \cap B = \emptyset, i.e., they cannot occur simultaneously.

When AB=A \cap B = \emptyset, the general addition rule reduces to:

P(AB)=P(A)+P(B)\begin{aligned} P(A \cup B) = P(A) + P(B) \end{aligned}

This follows directly because P(AB)=P()=0P(A \cap B) = P(\emptyset) = 0 by Theorem 1.

Example

A card is drawn from a standard 52-card deck.

  • Let AA = "drawing a king", BB = "drawing a queen".
  • AB=A \cap B = \emptyset (a card cannot be both a king and a queen).
  • P(AB)=P(A)+P(B)=452+452=852=213P(A \cup B) = P(A) + P(B) = \frac{4}{52} + \frac{4}{52} = \frac{8}{52} = \frac{2}{13}.

Extension to Three Events

P(ABC)=P(A)+P(B)+P(C)P(AB)P(AC)P(BC)+P(ABC)\begin{aligned} P(A \cup B \cup C) &= P(A) + P(B) + P(C) \\ &\quad - P(A \cap B) - P(A \cap C) - P(B \cap C) \\ &\quad + P(A \cap B \cap C) \end{aligned}

Proof sketch. Apply the two-event rule to P((AB)C)P((A \cup B) \cup C):

P((AB)C)=P(AB)+P(C)P((AB)C)=[P(A)+P(B)P(AB)]+P(C)P((AC)(BC))\begin{aligned} P((A \cup B) \cup C) &= P(A \cup B) + P(C) - P((A \cup B) \cap C) \\ &= [P(A) + P(B) - P(A \cap B)] + P(C) - P((A \cap C) \cup (B \cap C)) \end{aligned}

Applying the two-event rule to the last term:

P((AC)(BC))=P(AC)+P(BC)P(ABC)\begin{aligned} P((A \cap C) \cup (B \cap C)) = P(A \cap C) + P(B \cap C) - P(A \cap B \cap C) \end{aligned}

Substituting back and rearranging yields the stated result. \square

The pattern continues: for nn events, you alternate adding and subtracting terms of each intersection size. This is the probability analogue of the inclusion-exclusion principle.

Inclusion-Exclusion Principle

The addition rule is the probability counterpart of the inclusion-exclusion principle for counting:

AB=A+BAB\begin{aligned} |A \cup B| = |A| + |B| - |A \cap B| \end{aligned}

When outcomes are equally likely, dividing both sides by S|S| yields the general addition rule. The full inclusion-exclusion principle extends to nn sets and is covered in combinatorics).

DSE-style Example

In a class of 40 students, 25 study Physics, 20 study Chemistry, and 8 study both. A student is chosen at random. Find the probability that the student studies at least one of the two subjects.

Let PP = studies Physics, CC = studies Chemistry.

P(PC)=P(P)+P(C)P(PC)=2540+2040840=3740\begin{aligned} P(P \cup C) &= P(P) + P(C) - P(P \cap C) \\ &= \frac{25}{40} + \frac{20}{40} - \frac{8}{40} = \frac{37}{40} \end{aligned}

The probability that the student studies neither subject:

P((PC))=13740=340\begin{aligned} P((P \cup C)') = 1 - \frac{37}{40} = \frac{3}{40} \end{aligned}
info

The addition rule generalises naturally. For any number of events, the key insight is: add all individual probabilities, subtract all pairwise intersections, add back all triple intersections, and so on, alternating signs.

Conditional Probability

Definition

The conditional probability of AA given BB is:

P(AB)=P(AB)P(B),P(B)>0\begin{aligned} P(A \mid B) = \frac{P(A \cap B)}{P(B)}, \quad P(B) \gt{} 0 \end{aligned}

Intuition: Shrinking the Sample Space

Conditioning on BB means we restrict our universe to outcomes where BB has already occurred. The sample space effectively shrinks from SS to BB. Within this restricted space, the probability of AA is the proportion of BB that also belongs to AA:

P(AB)=ABB\begin{aligned} P(A \mid B) = \frac{|A \cap B|}{|B|} \end{aligned}

This is equivalent to P(AB)P(B)\frac{P(A \cap B)}{P(B)} when all outcomes are equally likely.

Example

A fair die is rolled. Given that the result is even, find the probability that it is greater than 4.

  • BB = "even" = 2,4,6\\{2, 4, 6\\}, so P(B)=36=12P(B) = \frac{3}{6} = \frac{1}{2}.
  • AA = "greater than 4" = 5,6\\{5, 6\\}.
  • AB=6A \cap B = \\{6\\}, so P(AB)=16P(A \cap B) = \frac{1}{6}.
  • P(AB)=1/61/2=13P(A \mid B) = \frac{1/6}{1/2} = \frac{1}{3}.

Verification by shrinking: within 2,4,6\\{2, 4, 6\\}, only 66 is greater than 44, so 13\frac{1}{3}. \checkmark

Fundamental Properties

Theorem. P(AB)=1P(AB)P(A \mid B) = 1 - P(A' \mid B).

Proof. Since ABA \cap B and ABA' \cap B partition BB (they are mutually exclusive and their union is BB):

P(B)=P(AB)+P(AB)1=P(AB)P(B)+P(AB)P(B)1=P(AB)+P(AB)\begin{aligned} P(B) &= P(A \cap B) + P(A' \cap B) \\ 1 &= \frac{P(A \cap B)}{P(B)} + \frac{P(A' \cap B)}{P(B)} \\ 1 &= P(A \mid B) + P(A' \mid B) \end{aligned}

Therefore P(AB)=1P(AB)P(A \mid B) = 1 - P(A' \mid B). \square

This is the conditional analogue of Theorem 2: the conditional probabilities of an event and its complement must sum to 1 within the conditioned space.

Multiplication Rule

Theorem. P(AB)=P(A)P(BA)=P(B)P(AB)P(A \cap B) = P(A) \cdot P(B \mid A) = P(B) \cdot P(A \mid B).

Proof. From the definition of conditional probability:

P(AB)=P(AB)P(B)    P(AB)=P(B)P(AB)\begin{aligned} P(A \mid B) = \frac{P(A \cap B)}{P(B)} \implies P(A \cap B) = P(B) \cdot P(A \mid B) \end{aligned}

By symmetry, swapping AA and BB:

P(BA)=P(AB)P(A)    P(AB)=P(A)P(BA)\begin{aligned} P(B \mid A) = \frac{P(A \cap B)}{P(A)} \implies P(A \cap B) = P(A) \cdot P(B \mid A) \quad \square \end{aligned}

For three events, the multiplication rule extends naturally:

P(ABC)=P(A)P(BA)P(CAB)\begin{aligned} P(A \cap B \cap C) = P(A) \cdot P(B \mid A) \cdot P(C \mid A \cap B) \end{aligned}

This follows by two applications of the two-event rule:

P(ABC)=P(AB)P(CAB)=P(A)P(BA)P(CAB)\begin{aligned} P(A \cap B \cap C) = P(A \cap B) \cdot P(C \mid A \cap B) = P(A) \cdot P(B \mid A) \cdot P(C \mid A \cap B) \end{aligned}
Example

A bag contains 5 red and 3 blue balls. Two balls are drawn without replacement. Find the probability that both are red.

  • P(1stred)=58P(\mathrm{1st red}) = \frac{5}{8}.
  • P(2ndred1stred)=47P(\mathrm{2nd red} \mid \mathrm{1st red}) = \frac{4}{7}.
  • P(bothred)=58×47=2056=514P(\mathrm{both red}) = \frac{5}{8} \times \frac{4}{7} = \frac{20}{56} = \frac{5}{14}.

With replacement, the second draw is unaffected:

  • P(bothred,withreplacement)=58×58=2564P(\mathrm{both red, with replacement}) = \frac{5}{8} \times \frac{5}{8} = \frac{25}{64}.

Independence

Definition

Two events AA and BB are independent if and only if:

P(AB)=P(A)P(B)\begin{aligned} P(A \cap B) = P(A) \cdot P(B) \end{aligned}

Equivalent Characterisation

Theorem. AA and BB are independent if and only if P(AB)=P(A)P(A \mid B) = P(A) (provided P(B)>0P(B) \gt{} 0).

Proof.

(\Rightarrow) If independent, P(AB)=P(A)P(B)P(A \cap B) = P(A) \cdot P(B), so:

P(AB)=P(AB)P(B)=P(A)P(B)P(B)=P(A)\begin{aligned} P(A \mid B) = \frac{P(A \cap B)}{P(B)} = \frac{P(A) \cdot P(B)}{P(B)} = P(A) \end{aligned}

(\Leftarrow) If P(AB)=P(A)P(A \mid B) = P(A), then:

P(AB)P(B)=P(A)    P(AB)=P(A)P(B)\begin{aligned} \frac{P(A \cap B)}{P(B)} = P(A) \implies P(A \cap B) = P(A) \cdot P(B) \quad \square \end{aligned}

Independence means that knowing BB occurred provides zero information about AA. The conditional probability is unchanged from the unconditional one.

Complement Independence

Theorem. If AA and BB are independent, then each of the following pairs is also independent: AA and BB', AA' and BB, AA' and BB'.

Proof. We show AA and BB' are independent. The others follow by identical reasoning.

P(AB)=P(A)P(AB)(sinceA=(AB)(AB))=P(A)P(A)P(B)(byindependence)=P(A)(1P(B))=P(A)P(B)\begin{aligned} P(A \cap B') &= P(A) - P(A \cap B) &\quad&\mathrm{(since } A = (A \cap B) \cup (A \cap B') \mathrm{)} \\ &= P(A) - P(A) \cdot P(B) &\quad&\mathrm{(by independence)} \\ &= P(A)(1 - P(B)) \\ &= P(A) \cdot P(B') \quad \square \end{aligned}
Example

Two fair coins are tossed. Let AA = "first coin is heads" and BB = "second coin is tails".

  • P(A)=12P(A) = \frac{1}{2}, P(B)=12P(B) = \frac{1}{2}, P(AB)=14P(A \cap B) = \frac{1}{4}.
  • P(A)P(B)=14=P(AB)P(A) \cdot P(B) = \frac{1}{4} = P(A \cap B). Independent. \checkmark

Now AA' = "first coin is tails" and BB' = "second coin is heads":

  • P(A)=12P(A') = \frac{1}{2}, P(B)=12P(B') = \frac{1}{2}, P(AB)=P(tails,heads)=14P(A' \cap B') = P(\mathrm{tails, heads}) = \frac{1}{4}.
  • P(A)P(B)=14=P(AB)P(A') \cdot P(B') = \frac{1}{4} = P(A' \cap B'). Independent. \checkmark

Common Pitfall: Mutually Exclusive \neq Independent

Theorem. If AA and BB are mutually exclusive with P(A)>0P(A) \gt{} 0 and P(B)>0P(B) \gt{} 0, then AA and BB are not independent.

Proof. If AB=A \cap B = \emptyset, then P(AB)=0P(A \cap B) = 0. But P(A)P(B)>0P(A) \cdot P(B) \gt{} 0 since both factors are positive. Therefore P(AB)P(A)P(B)P(A \cap B) \neq P(A) \cdot P(B), so AA and BB are not independent. \square

Intuition: mutually exclusive events carry strong negative information about each other -- knowing one occurred guarantees the other did not. Independence means no information transfer at all. These are opposite extremes.

The only case where mutually exclusive events are also independent is the degenerate case where at least one event has probability zero.

DSE-style Example

A fair coin is tossed 3 times. Let AA = "all three tosses are heads" and BB = "the first toss is tails".

  • P(A)=18P(A) = \frac{1}{8}, P(B)=12P(B) = \frac{1}{2}.
  • AB=A \cap B = \emptyset (cannot have all heads if the first is tails), so P(AB)=0P(A \cap B) = 0.
  • Check independence: P(A)P(B)=18×12=1160=P(AB)P(A) \cdot P(B) = \frac{1}{8} \times \frac{1}{2} = \frac{1}{16} \neq 0 = P(A \cap B).
  • AA and BB are mutually exclusive but not independent.

For a valid independence example in the same experiment: let CC = "first toss is heads" and DD = "second toss is tails". Then P(CD)=14=12×12=P(C)P(D)P(C \cap D) = \frac{1}{4} = \frac{1}{2} \times \frac{1}{2} = P(C) \cdot P(D). CC and DD are independent.

info

When testing independence, always compute both P(AB)P(A \cap B) and P(A)P(B)P(A) \cdot P(B) separately and compare. Do not assume independence from the problem description -- it must be verified or explicitly stated.

Bayes' Theorem

Statement

For two events AA and BB with P(A)>0P(A) \gt{} 0 and P(B)>0P(B) \gt{} 0:

P(AB)=P(BA)P(A)P(B)\begin{aligned} P(A \mid B) = \frac{P(B \mid A) \cdot P(A)}{P(B)} \end{aligned}

Proof

Starting from the definition of conditional probability and the multiplication rule:

P(AB)=P(AB)P(B)=P(BA)P(A)P(B)\begin{aligned} P(A \mid B) &= \frac{P(A \cap B)}{P(B)} \\ &= \frac{P(B \mid A) \cdot P(A)}{P(B)} \quad \square \end{aligned}

Bayes' theorem "reverses" the conditioning: it expresses P(AB)P(A \mid B) in terms of P(BA)P(B \mid A). This is the mathematical foundation of statistical inference -- updating beliefs given evidence.

Law of Total Probability

If B1,B2,,BnB_1, B_2, \ldots, B_n form a partition of SS (pairwise mutually exclusive, exhaustive, and P(Bi)>0P(B_i) \gt{} 0 for all ii), then for any event AA:

P(A)=i=1nP(ABi)P(Bi)\begin{aligned} P(A) = \sum_{i=1}^{n} P(A \mid B_i) \cdot P(B_i) \end{aligned}

Proof. Since the BiB_i partition SS, the events AB1,AB2,,ABnA \cap B_1, A \cap B_2, \ldots, A \cap B_n are pairwise mutually exclusive and their union equals AA. By Axiom 3:

P(A)=i=1nP(ABi)=i=1nP(ABi)P(Bi)\begin{aligned} P(A) &= \sum_{i=1}^{n} P(A \cap B_i) = \sum_{i=1}^{n} P(A \mid B_i) \cdot P(B_i) \quad \square \end{aligned}

Extended Bayes' Theorem

Combining Bayes' theorem with the law of total probability gives the most useful form. If B1,,BnB_1, \ldots, B_n partition SS and AA is one of them (say A=BjA = B_j), then for any event EE with P(E)>0P(E) \gt{} 0:

P(BjE)=P(EBj)P(Bj)i=1nP(EBi)P(Bi)\begin{aligned} P(B_j \mid E) = \frac{P(E \mid B_j) \cdot P(B_j)}{\displaystyle\sum_{i=1}^{n} P(E \mid B_i) \cdot P(B_i)} \end{aligned}

The denominator is P(E)P(E) computed via the law of total probability.

Example: Medical Testing

A disease affects 1% of a population. A test has:

  • Sensitivity (true positive rate): P(positivedisease)=0.95P(\mathrm{positive} \mid \mathrm{disease}) = 0.95.
  • False positive rate: P(positivenodisease)=0.02P(\mathrm{positive} \mid \mathrm{no disease}) = 0.02.

A person tests positive. What is the probability they actually have the disease?

Let DD = has disease, DD' = no disease, ++ = tests positive. The partition is {D,D}\{D, D'\}.

By Bayes' theorem:

P(D+)=P(+D)P(D)P(+D)P(D)+P(+D)P(D)=0.95×0.010.95×0.01+0.02×0.99=0.00950.0095+0.0198=0.00950.02930.324\begin{aligned} P(D \mid +) &= \frac{P(+ \mid D) \cdot P(D)}{P(+ \mid D) \cdot P(D) + P(+ \mid D') \cdot P(D')} \\ &= \frac{0.95 \times 0.01}{0.95 \times 0.01 + 0.02 \times 0.99} \\ &= \frac{0.0095}{0.0095 + 0.0198} \\ &= \frac{0.0095}{0.0293} \approx 0.324 \end{aligned}

Despite a 95% accurate test, a positive result only means about 32.4% chance of disease. This counterintuitive result occurs because the disease is rare -- false positives vastly outnumber true positives in absolute terms.

Example: Quality Control

A factory has three machines producing items. Machine M1M_1 produces 50% of items with 2% defective. Machine M2M_2 produces 30% with 3% defective. Machine M3M_3 produces 20% with 5% defective. An item is randomly selected and found to be defective. What is the probability it came from M3M_3?

Let DD = defective. The partition is {M1,M2,M3}\{M_1, M_2, M_3\}.

P(M3D)=P(DM3)P(M3)P(DM1)P(M1)+P(DM2)P(M2)+P(DM3)P(M3)=0.05×0.200.02×0.50+0.03×0.30+0.05×0.20=0.010.01+0.009+0.01=0.010.0290.345\begin{aligned} P(M_3 \mid D) &= \frac{P(D \mid M_3) \cdot P(M_3)}{P(D \mid M_1) \cdot P(M_1) + P(D \mid M_2) \cdot P(M_2) + P(D \mid M_3) \cdot P(M_3)} \\ &= \frac{0.05 \times 0.20}{0.02 \times 0.50 + 0.03 \times 0.30 + 0.05 \times 0.20} \\ &= \frac{0.01}{0.01 + 0.009 + 0.01} \\ &= \frac{0.01}{0.029} \approx 0.345 \end{aligned}

Despite M3M_3 having the highest defect rate, it only accounts for about 34.5% of defective items because it produces the smallest share of total output.

Probability Trees

Construction

A probability tree is a directed graph that decomposes a multi-stage experiment into sequential branches. Each level of the tree represents a stage, each branch represents an outcome at that stage, and each branch is labelled with its probability.

Rules:

  1. The probabilities on branches from each node must sum to 1.
  2. The probability of any complete path (root to leaf) is the product of all branch probabilities along that path (multiplication rule).
  3. The probability of an event is the sum of probabilities of all paths leading to that event (addition rule for mutually exclusive paths).

Worked Example

Two balls are drawn without replacement from a bag containing 4 red and 2 blue balls.

Tree diagram (text representation)
4/6 (R) -- 3/5 (R) => P = 4/6 x 3/5 = 12/30
/ \ 2/5 (B) => P = 4/6 x 2/5 = 8/30
/
Root -- 4/6 (R)
\
\ 2/6 (B) -- 4/5 (R) => P = 2/6 x 4/5 = 8/30
\ 1/5 (B) => P = 2/6 x 1/5 = 2/30

Verification: all path probabilities sum to 12+8+8+230=3030=1\frac{12 + 8 + 8 + 2}{30} = \frac{30}{30} = 1. \checkmark

  • P(bothred)=1230=25P(\mathrm{both red}) = \frac{12}{30} = \frac{2}{5}.
  • P(bothblue)=230=115P(\mathrm{both blue}) = \frac{2}{30} = \frac{1}{15}.
  • P(samecolour)=1230+230=1430=715P(\mathrm{same colour}) = \frac{12}{30} + \frac{2}{30} = \frac{14}{30} = \frac{7}{15}.
  • P(differentcolours)=830+830=1630=815P(\mathrm{different colours}) = \frac{8}{30} + \frac{8}{30} = \frac{16}{30} = \frac{8}{15}.

Note: P(same)+P(different)=1P(\mathrm{same}) + P(\mathrm{different}) = 1, as expected since these events are complements.

Connection to Multiplication and Addition Rules

Probability trees are a visual encoding of the multiplication and addition rules:

  • Along a path (sequential stages): multiply probabilities -- this is the multiplication rule.
  • Across paths (mutually exclusive ways to reach an event): add probabilities -- this is the addition rule for mutually exclusive events.

Trees are especially useful for problems involving:

  • Sequential draws with or without replacement.
  • Multi-step processes where later probabilities depend on earlier outcomes.
  • Any scenario requiring the law of total probability (sum over all branches at the final level).
DSE-style Example

A box contains 3 defective and 7 good bulbs. Bulbs are tested one by one without replacement. Find the probability that the second defective bulb is found on the third test.

The second defective is found on the third test means: exactly one defective in the first two tests, and the third is defective.

Case 1: Good, then Defective, then Defective:

P=710×39×28=42720=7120\begin{aligned} P = \frac{7}{10} \times \frac{3}{9} \times \frac{2}{8} = \frac{42}{720} = \frac{7}{120} \end{aligned}

Case 2: Defective, then Good, then Defective:

P=310×79×28=42720=7120\begin{aligned} P = \frac{3}{10} \times \frac{7}{9} \times \frac{2}{8} = \frac{42}{720} = \frac{7}{120} \end{aligned}

Total probability (addition rule for mutually exclusive cases):

P=7120+7120=14120=760\begin{aligned} P = \frac{7}{120} + \frac{7}{120} = \frac{14}{120} = \frac{7}{60} \end{aligned}

Wrap-up Questions
  1. Question: A fair coin is tossed three times. Find the probability of getting at least two heads.
Answer

Sample space has 23=82^3 = 8 equally likely outcomes.

At least two heads means 2 or 3 heads:

  • 2 heads: HHT,HTH,THH\\{HHT, HTH, THH\\}, 3 outcomes.
  • 3 heads: HHH\\{HHH\\}, 1 outcome.
P(atleast2heads)=48=12\begin{aligned} P(\mathrm{at least 2 heads}) = \frac{4}{8} = \frac{1}{2} \end{aligned}

Alternatively, using the binomial formula:

P(atleast2heads)=(32)(12)3+(33)(12)3=38+18=12\begin{aligned} P(\mathrm{at least 2 heads}) = \binom{3}{2}\left(\frac{1}{2}\right)^3 + \binom{3}{3}\left(\frac{1}{2}\right)^3 = \frac{3}{8} + \frac{1}{8} = \frac{1}{2} \end{aligned}
  1. Question: In a group of 50 students, 30 play basketball, 25 play football, and 10 play neither. A student is chosen at random. Find the probability that the student plays both sports.
Answer

Let BB = plays basketball, FF = plays football.

  • P(B)=3050=35P(B) = \frac{30}{50} = \frac{3}{5}, P(F)=2550=12P(F) = \frac{25}{50} = \frac{1}{2}.
  • 10 play neither, so 40 play at least one: P(BF)=4050=45P(B \cup F) = \frac{40}{50} = \frac{4}{5}.

By the addition rule:

P(BF)=P(B)+P(F)P(BF)=35+1245=610+510810=310\begin{aligned} P(B \cap F) &= P(B) + P(F) - P(B \cup F) \\ &= \frac{3}{5} + \frac{1}{2} - \frac{4}{5} = \frac{6}{10} + \frac{5}{10} - \frac{8}{10} = \frac{3}{10} \end{aligned}
  1. Question: A bag contains 4 white and 6 black balls. Two balls are drawn at random without replacement. Find the probability that they are of different colours.
Answer

Method 1 (direct): white then black, or black then white.

P=410×69+610×49=2490+2490=4890=815\begin{aligned} P &= \frac{4}{10} \times \frac{6}{9} + \frac{6}{10} \times \frac{4}{9} \\ &= \frac{24}{90} + \frac{24}{90} = \frac{48}{90} = \frac{8}{15} \end{aligned}

Method 2 (complement):

P(different)=1P(same)=1P(bothwhite)P(bothblack)P(bothwhite)=410×39=1290P(bothblack)=610×59=3090P(different)=112903090=4890=815\begin{aligned} P(\mathrm{different}) &= 1 - P(\mathrm{same}) \\ &= 1 - P(\mathrm{both white}) - P(\mathrm{both black}) \\ P(\mathrm{both white}) &= \frac{4}{10} \times \frac{3}{9} = \frac{12}{90} \\ P(\mathrm{both black}) &= \frac{6}{10} \times \frac{5}{9} = \frac{30}{90} \\ P(\mathrm{different}) &= 1 - \frac{12}{90} - \frac{30}{90} = \frac{48}{90} = \frac{8}{15} \quad \checkmark \end{aligned}
  1. Question: Events AA and BB are such that P(A)=0.6P(A) = 0.6, P(B)=0.5P(B) = 0.5, and P(AB)=0.4P(A \mid B) = 0.4. Find P(AB)P(A \cup B).
Answer

From P(AB)=P(AB)P(B)P(A \mid B) = \frac{P(A \cap B)}{P(B)}:

P(AB)=P(AB)P(B)=0.4×0.5=0.2\begin{aligned} P(A \cap B) = P(A \mid B) \cdot P(B) = 0.4 \times 0.5 = 0.2 \end{aligned}

By the addition rule:

P(AB)=P(A)+P(B)P(AB)=0.6+0.50.2=0.9\begin{aligned} P(A \cup B) = P(A) + P(B) - P(A \cap B) = 0.6 + 0.5 - 0.2 = 0.9 \end{aligned}
  1. Question: A and B are independent events with P(A)=0.3P(A) = 0.3 and P(B)=0.5P(B) = 0.5. Find P(AB)P(A' \cap B').
Answer

By the complement independence theorem, since AA and BB are independent, AA' and BB' are also independent:

P(AB)=P(A)P(B)=(10.3)(10.5)=0.7×0.5=0.35\begin{aligned} P(A' \cap B') = P(A') \cdot P(B') = (1 - 0.3)(1 - 0.5) = 0.7 \times 0.5 = 0.35 \end{aligned}

Verification via complement: P(AB)=P((AB))=1P(AB)P(A' \cap B') = P((A \cup B)') = 1 - P(A \cup B).

P(AB)=P(A)+P(B)P(AB)=0.3+0.50.15=0.65P(AB)=10.65=0.35\begin{aligned} P(A \cup B) &= P(A) + P(B) - P(A \cap B) = 0.3 + 0.5 - 0.15 = 0.65 \\ P(A' \cap B') &= 1 - 0.65 = 0.35 \quad \checkmark \end{aligned}
  1. Question: Two events AA and BB satisfy P(A)=13P(A) = \frac{1}{3}, P(B)=14P(B) = \frac{1}{4}, and P(AB)=512P(A \cup B) = \frac{5}{12}. Determine whether AA and BB are independent.
Answer

By the addition rule:

P(AB)=P(A)+P(B)P(AB)=13+14512=412+312512=212=16\begin{aligned} P(A \cap B) &= P(A) + P(B) - P(A \cup B) \\ &= \frac{1}{3} + \frac{1}{4} - \frac{5}{12} = \frac{4}{12} + \frac{3}{12} - \frac{5}{12} = \frac{2}{12} = \frac{1}{6} \end{aligned}

Check independence: P(A)P(B)=13×14=11216=P(AB)P(A) \cdot P(B) = \frac{1}{3} \times \frac{1}{4} = \frac{1}{12} \neq \frac{1}{6} = P(A \cap B).

Since P(AB)P(A)P(B)P(A \cap B) \neq P(A) \cdot P(B), the events are not independent.

  1. Question: A box contains 5 red, 3 green, and 2 blue marbles. Three marbles are drawn without replacement. Find the probability that all three are the same colour.
Answer

The three colours are mutually exclusive cases, so by the addition rule:

P(allred)=510×49×38=60720=112P(allgreen)=310×29×18=6720=1120P(allblue)=210×19×08=0\begin{aligned} P(\mathrm{all red}) &= \frac{5}{10} \times \frac{4}{9} \times \frac{3}{8} = \frac{60}{720} = \frac{1}{12} \\ P(\mathrm{all green}) &= \frac{3}{10} \times \frac{2}{9} \times \frac{1}{8} = \frac{6}{720} = \frac{1}{120} \\ P(\mathrm{all blue}) &= \frac{2}{10} \times \frac{1}{9} \times \frac{0}{8} = 0 \end{aligned}P(allsamecolour)=112+1120+0=10120+1120=11120\begin{aligned} P(\mathrm{all same colour}) = \frac{1}{12} + \frac{1}{120} + 0 = \frac{10}{120} + \frac{1}{120} = \frac{11}{120} \end{aligned}
  1. Question: In a certain school, 60% of students take Mathematics, 40% take Physics, and 30% take both. A student is selected at random. Given that the student takes Mathematics, what is the probability that they also take Physics?
Answer
P(PhysicsMaths)=P(PhysicsMaths)P(Maths)=0.300.60=0.5\begin{aligned} P(\mathrm{Physics} \mid \mathrm{Maths}) = \frac{P(\mathrm{Physics} \cap \mathrm{Maths})}{P(\mathrm{Maths})} = \frac{0.30}{0.60} = 0.5 \end{aligned}

Half of Mathematics students also take Physics.

  1. Question: A factory produces items using Machine XX (60% of output) and Machine YY (40% of output). The defect rates are 3% for XX and 7% for YY. An item is found to be defective. Use Bayes' theorem to find the probability it was produced by Machine XX.
Answer

Let DD = defective. The partition is {X,Y}\{X, Y\}.

P(XD)=P(DX)P(X)P(DX)P(X)+P(DY)P(Y)=0.03×0.600.03×0.60+0.07×0.40=0.0180.018+0.028=0.0180.0460.391\begin{aligned} P(X \mid D) &= \frac{P(D \mid X) \cdot P(X)}{P(D \mid X) \cdot P(X) + P(D \mid Y) \cdot P(Y)} \\ &= \frac{0.03 \times 0.60}{0.03 \times 0.60 + 0.07 \times 0.40} \\ &= \frac{0.018}{0.018 + 0.028} = \frac{0.018}{0.046} \approx 0.391 \end{aligned}

Despite producing 60% of items, Machine XX accounts for only about 39.1% of defective items because its defect rate is lower.

  1. Question: A fair die is rolled twice. Find the probability that the sum of the two results is 8, given that the first result is at least 3.
Answer

Let AA = "sum is 8", BB = "first result \geq 3".

Sample space: 6×6=366 \times 6 = 36 equally likely outcomes.

B=4×6=24|B| = 4 \times 6 = 24 (first die shows 3, 4, 5, or 6).

ABA \cap B = outcomes with first \geq 3 and sum 8: (3,5),(4,4),(5,3),(6,2)\\{(3,5), (4,4), (5,3), (6,2)\\}, so AB=4|A \cap B| = 4.

P(AB)=ABB=424=16\begin{aligned} P(A \mid B) = \frac{|A \cap B|}{|B|} = \frac{4}{24} = \frac{1}{6} \end{aligned}

For comparison, the unconditional probability: P(A)=536P(A) = \frac{5}{36} (pairs (2,6),(3,5),(4,4),(5,3),(6,2)(2,6), (3,5), (4,4), (5,3), (6,2)). Conditioning on the first die being \geq 3 eliminates (2,6)(2,6), reducing the count from 5 to 4.

  1. Question: AA, BB, and CC are three events with P(A)=P(B)=P(C)=13P(A) = P(B) = P(C) = \frac{1}{3}, P(AB)=P(AC)=P(BC)=16P(A \cap B) = P(A \cap C) = P(B \cap C) = \frac{1}{6}, and P(ABC)=112P(A \cap B \cap C) = \frac{1}{12}. Find P(ABC)P(A \cup B \cup C).
Answer

Using the three-event addition rule:

P(ABC)=P(A)+P(B)+P(C)P(AB)P(AC)P(BC)+P(ABC)=13+13+13161616+112=112+112=1212612+112=712\begin{aligned} P(A \cup B \cup C) &= P(A) + P(B) + P(C) - P(A \cap B) - P(A \cap C) - P(B \cap C) + P(A \cap B \cap C) \\ &= \frac{1}{3} + \frac{1}{3} + \frac{1}{3} - \frac{1}{6} - \frac{1}{6} - \frac{1}{6} + \frac{1}{12} \\ &= 1 - \frac{1}{2} + \frac{1}{12} \\ &= \frac{12}{12} - \frac{6}{12} + \frac{1}{12} = \frac{7}{12} \end{aligned}
  1. Question: A test for a condition has a sensitivity of 90% and a specificity of 95%. The condition prevalence in the population is 1%. Find the positive predictive value P(conditionpositive)P(\mathrm{condition} \mid \mathrm{positive}).
Answer
  • Sensitivity: P(+C)=0.90P(+ \mid C) = 0.90.
  • Specificity: P(C)=0.95P(- \mid C') = 0.95, so P(+C)=10.95=0.05P(+ \mid C') = 1 - 0.95 = 0.05.
  • Prevalence: P(C)=0.01P(C) = 0.01, P(C)=0.99P(C') = 0.99.

By Bayes' theorem:

P(C+)=P(+C)P(C)P(+C)P(C)+P(+C)P(C)=0.90×0.010.90×0.01+0.05×0.99=0.0090.009+0.0495=0.0090.05850.154\begin{aligned} P(C \mid +) &= \frac{P(+ \mid C) \cdot P(C)}{P(+ \mid C) \cdot P(C) + P(+ \mid C') \cdot P(C')} \\ &= \frac{0.90 \times 0.01}{0.90 \times 0.01 + 0.05 \times 0.99} \\ &= \frac{0.009}{0.009 + 0.0495} \\ &= \frac{0.009}{0.0585} \approx 0.154 \end{aligned}

A positive result means only about 15.4% chance of actually having the condition. This is the base rate fallacy in action: low prevalence swamps even a good test's signal.

For the A-Level treatment of this topic, see Probability.


tip

Diagnostic Test Ready to test your understanding of Probability? The diagnostic test contains the hardest questions within the DSE specification for this topic, each with a full worked solution.

Unit tests probe edge cases and common misconceptions. Integration tests combine Probability with other DSE mathematics topics to test synthesis under exam conditions.

See Diagnostic Guide for instructions on self-marking and building a personal test matrix.