← Back to FractionRush

Further Statistics S1 Statistics

Grade 11 · Statistics 1 · Cambridge A-Level 9709 · Age 16–17

Welcome to Further Statistics!

This topic covers the final and most powerful ideas in Cambridge A-Level Statistics 1 (9709): measuring the strength and direction of relationships between variables using correlation, finding the line of best fit using linear regression, and drawing evidence-based conclusions using hypothesis testing. These tools are used throughout science, economics, medicine, and data analysis.

r = Sxy / √(Sxx · Syy)  |  y = a + bx, b = Sxy/Sxx  |  Z = (x̄ − μ0) / (σ/√n)

Learning Objectives

  • Calculate and interpret Pearson's product moment correlation coefficient r
  • Understand that −1 ≤ r ≤ 1 and interpret strength/direction of correlation
  • Identify that correlation does not imply causation
  • Find the equation of the regression line y = a + bx using Sxy and Sxx
  • Interpret the gradient b and intercept a in context
  • Understand the danger of extrapolation outside the data range
  • Set up null and alternative hypotheses H0 and H1
  • Choose one-tailed or two-tailed tests and identify the critical region
  • Conduct hypothesis tests for the probability parameter p in B(n, p)
  • Conduct hypothesis tests for the population mean μ using the Normal distribution
  • Write conclusions in context at a given significance level

Correlation r

Measures linear association: r = ±1 perfect; r = 0 no linear correlation

Scatter Diagrams

Visual display of bivariate data — look for pattern, direction, and outliers

Regression Line

y = a + bx (least squares); passes through (x̄, ȳ)

Extrapolation

Using the line outside the data range — unreliable and not recommended

H₀ and H₁

Null hypothesis (assumed true) vs alternative (what we want to test)

Critical Region

Values of the test statistic that lead to rejecting H0

Binomial Test

Test p using B(n, p0) — compare tail probability with α

Normal Mean Test

Standardise x̄ using Z = (x̄ − μ0) / (σ/√n); compare with zα

Learn 1 — Correlation

What Is Correlation?

Correlation measures the strength and direction of a linear relationship between two variables x and y. It is quantified by Pearson's product moment correlation coefficient r, which always lies between −1 and 1 (inclusive).

−1 ≤ r ≤ 1

Interpreting r

r = +1: Perfect positive linear correlation — as x increases, y increases exactly along a straight line.
r = −1: Perfect negative linear correlation — as x increases, y decreases exactly along a straight line.
r = 0: No linear correlation — knowing x tells you nothing about y (on a linear basis).

Strength guidelines (approximate):
• |r| ≥ 0.8 — strong correlation
• 0.5 ≤ |r| < 0.8 — moderate correlation
• |r| < 0.5 — weak correlation

Direction: r > 0 means positive correlation; r < 0 means negative correlation.

Calculating r

The formula uses three summary statistics: Sxx, Syy, and Sxy.

r = Sxy / √(Sxx · Syy)

where   Sxx = Σx² − n x̄²  |  Syy = Σy² − n ȳ²  |  Sxy = Σxy − n x̄ ȳ
Example: Five data points give Sxy = 48, Sxx = 60, Syy = 50. Find r.

r = 48 / √(60 × 50) = 48 / √3000 = 48 / 54.77 ≈ 0.877

Conclusion: strong positive linear correlation between x and y.

Scatter Diagrams

Before calculating r, always draw (or inspect) a scatter diagram — a graph where each data point (x, y) is plotted as a dot. Look for:

Direction: points rising left-to-right (positive) or falling (negative)?
Linearity: do points cluster around a straight line, or a curve?
Strength: how tightly do points cluster around the trend line?
Outliers: any points far from the main pattern? These may heavily affect r.

Correlation Does NOT Imply Causation

Critical concept: Even if r is close to ±1, this does NOT mean x causes y to change. Both variables might be influenced by a third (confounding) variable, or the correlation could be coincidental.

Example: Ice cream sales and drowning rates are positively correlated — but ice cream does not cause drowning. Both increase in hot weather (a confounding variable).

Interpreting r in Context

When asked to interpret r in an exam:

1. State the strength: strong / moderate / weak.
2. State the direction: positive / negative.
3. State the variables: "there is a strong positive linear correlation between revision time and exam score."
4. Do NOT say "revision time causes exam score to increase" unless the question asks about causation specifically.
In the Cambridge S1 exam, you are often given r and asked to "comment on" or "interpret" it. Always use the context of the question, state the direction and strength, and never claim causation unless instructed.

Learn 2 — Linear Regression

The Regression Line y on x

When there is a linear relationship between x and y, we can find the least squares regression line y = a + bx. This line minimises the sum of squared vertical distances from each point to the line — giving the "best fit" straight line.

y = a + bx  |  b = Sxy / Sxx  |  a = ȳ − b x̄

Calculating b and a

Step 1: Calculate Sxy = Σxy − n x̄ ȳ and Sxx = Σx² − n x̄²
Step 2: b = Sxy / Sxx
Step 3: a = ȳ − b x̄
Step 4: Write the equation as y = a + bx
Example: n = 6, Σx = 30, Σy = 48, Σx² = 182, Σxy = 262.

x̄ = 30/6 = 5,   ȳ = 48/6 = 8
Sxx = 182 − 6(5)² = 182 − 150 = 32
Sxy = 262 − 6(5)(8) = 262 − 240 = 22
b = 22/32 = 0.6875
a = 8 − 0.6875 × 5 = 8 − 3.4375 = 4.5625
Regression line: y = 4.5625 + 0.6875x

The Regression Line Passes Through (x̄, ȳ)

The least squares regression line always passes through the mean point (x̄, ȳ). This is because a = ȳ − b x̄, so substituting x = x̄ gives y = a + b x̄ = ȳ. Use this as a check.

Always verify: does your regression line pass through (x̄, ȳ)? Substitute x = x̄ — if you don't get ȳ, you have made an arithmetic error in calculating a.

Using the Regression Line for Prediction

Example (continued): Predict y when x = 7.
y = 4.5625 + 0.6875(7) = 4.5625 + 4.8125 = 9.375

Interpreting the Gradient b

The gradient b means: for each unit increase in x, y increases by b units (on average).

Example: If x = study hours and y = exam score, and b = 3.2, then:
"For each additional hour of study, the predicted exam score increases by 3.2 marks."

Interpreting the Intercept a

The intercept a is the predicted value of y when x = 0.

Caution: This may not have a meaningful interpretation if x = 0 is outside the data range (e.g., if x = height, x = 0 is nonsensical).

Extrapolation — The Danger Zone

Extrapolation means using the regression line to predict y for x values outside the range of the original data. This is unreliable because:
• The linear relationship may not hold beyond the data range.
• There may be natural limits or non-linearities not captured.

Interpolation (predicting within the data range) is generally reliable.

Learn 3 — Hypothesis Testing Introduction

What Is Hypothesis Testing?

A hypothesis test uses sample data to decide whether there is sufficient evidence to reject an assumption about a population. It provides a formal framework for drawing conclusions under uncertainty.

Step 1 — State the Hypotheses

Null Hypothesis H0: The "default" assumption — what we assume to be true unless the data provides strong evidence against it. Always contains an equals sign (e.g., H0: p = 0.3).

Alternative Hypothesis H1: What we are trying to find evidence for. Dictates the direction of the test.

One-Tailed vs Two-Tailed Tests

One-tailed test (right): H1: p > p0 — we suspect the true value is higher than assumed.
One-tailed test (left): H1: p < p0 — we suspect the true value is lower than assumed.
Two-tailed test: H1: p ≠ p0 — we suspect the true value has changed (direction unknown).

How to decide: Read the question carefully. Words like "has increased", "is higher", "more likely" suggest a one-tailed test. "Has changed", "is different" suggest two-tailed.

Step 2 — Choose Significance Level α

The significance level α is the probability of incorrectly rejecting H0 when it is actually true (a Type I error). Common values: α = 0.05 (5%) or α = 0.01 (1%).

Step 3 — Find the Critical Region

The critical region is the set of values of the test statistic that would lead to rejecting H0.

For one-tailed tests at 5%: find the smallest critical value c such that P(X ≥ c) ≤ 0.05 (right tail) or P(X ≤ c) ≤ 0.05 (left tail).

For two-tailed tests at 5%: split the 5% equally — find critical regions in both tails, each with probability ≤ 2.5%.

Step 4 — Compare Test Statistic with Critical Region

Calculate the probability of observing a result as extreme as (or more extreme than) the data, assuming H0 is true. This is the p-value.

If p-value ≤ α, reject H0.   If p-value > α, do not reject H0.

Step 5 — Write Conclusion in Context

If you reject H0: "There is sufficient evidence at the X% significance level to reject H0. The data suggests [H1 stated in context]."

If you do not reject H0: "There is insufficient evidence at the X% significance level to reject H0. The data is consistent with [H0 stated in context]."
Never say "we accept H0" — we either "reject H0" or "do not reject H0". Failing to reject H0 does not prove it is true; it just means the evidence against it was insufficient.
In Cambridge S1, marks are lost by failing to state the conclusion in context. Always refer back to the specific situation in the question — don't just say "reject H₀", say what that means for the problem (e.g., "The probability that the coin shows heads has changed").

Learn 4 — Testing Binomial Probability

Setting Up the Test

When we observe X successes in n trials and want to test a claimed probability p0, we use the model X ~ B(n, p0) under H0.

H0: p = p0  |  H1: p > p0 (or < or ≠)

One-Tailed Test — Right Tail (H₁: p > p₀)

Example: A seed company claims germination rate is 0.4. A gardener plants 20 seeds; 12 germinate. Test at 5% whether the germination rate has increased.

H0: p = 0.4    H1: p > 0.4 (one-tailed right)

Under H0: X ~ B(20, 0.4)
Test statistic: x = 12
p-value = P(X ≥ 12) = 1 − P(X ≤ 11)
Using tables: P(X ≤ 11) = 0.9435, so P(X ≥ 12) = 1 − 0.9435 = 0.0565

Since 0.0565 > 0.05, we do not reject H0.
Conclusion: "There is insufficient evidence at the 5% level to conclude that the germination rate has increased from 0.4."

One-Tailed Test — Left Tail (H₁: p < p₀)

Example: A drug is claimed to cure 70% of patients (p = 0.7). In a trial of 15 patients, only 8 are cured. Test at 5% whether the cure rate has decreased.

H0: p = 0.7    H1: p < 0.7 (one-tailed left)

Under H0: X ~ B(15, 0.7)
p-value = P(X ≤ 8)
Using tables: P(X ≤ 8) = 0.0348

Since 0.0348 < 0.05, we reject H0.
Conclusion: "There is sufficient evidence at the 5% significance level that the cure rate has decreased from 0.7."

Two-Tailed Test (H₁: p ≠ p₀)

For a two-tailed test at 5%, compare each tail probability with 2.5% (i.e., 0.025).

Example: A fair coin is tossed 20 times; 14 heads. Test at 5% whether the coin is biased.

H0: p = 0.5    H1: p ≠ 0.5 (two-tailed)

Under H0: X ~ B(20, 0.5)
x = 14, which is above the mean (10), so test the upper tail:
P(X ≥ 14) = 1 − P(X ≤ 13) = 1 − 0.9423 = 0.0577

Since 0.0577 > 0.025, we do not reject H0.
Conclusion: "There is insufficient evidence at the 5% level to conclude that the coin is biased."

Finding the Critical Region

Instead of comparing a specific value, we find the range of values that would lead to rejection.

Example: X ~ B(20, 0.3) under H0. Find the critical region for H1: p > 0.3 at 5%.

Find smallest c such that P(X ≥ c) ≤ 0.05:
P(X ≥ 10) = 1 − P(X ≤ 9) = 1 − 0.9520 = 0.0480 ≤ 0.05 ✓
P(X ≥ 9) = 1 − P(X ≤ 8) = 1 − 0.8867 = 0.1133 > 0.05 ✗

Critical region: X ≥ 10
For two-tailed critical regions, find the smallest cupper such that P(X ≥ cupper) ≤ 0.025, AND the largest clower such that P(X ≤ clower) ≤ 0.025. The critical region is X ≤ clower or X ≥ cupper.

Learn 5 — Testing the Normal Mean

When to Use This Test

When a population is normally distributed with known variance σ², and we take a sample of size n, the sample mean x̄ has distribution:

x̄ ~ N(μ, σ²/n)

We can test whether the true population mean μ equals a specific value μ0.

Setting Up the Test

H0: μ = μ0
H1: μ ≠ μ0 (two-tailed) or μ > μ0 or μ < μ0 (one-tailed)

The Test Statistic Z

Under H0, standardise the sample mean:

Z = (x̄ − μ0) / (σ / √n)

Under H0, Z ~ N(0,1). Compare Z with critical values from the standard Normal table.

Critical Values

Test typeLevel αCritical valueReject H0 if
One-tailed (right)5%1.645Z > 1.645
One-tailed (left)5%−1.645Z < −1.645
Two-tailed5%±1.96|Z| > 1.96
One-tailed (right)1%2.326Z > 2.326
Two-tailed1%±2.576|Z| > 2.576

Full Worked Example — Two-Tailed

Example: The masses of bags of flour are normally distributed with σ = 8 g. The manufacturer claims μ = 500 g. A sample of 25 bags has mean x̄ = 496.4 g. Test at 5% whether the mean has changed.

H0: μ = 500    H1: μ ≠ 500 (two-tailed)

Z = (496.4 − 500) / (8 / √25) = −3.6 / 1.6 = −2.25

Critical region: |Z| > 1.96
|−2.25| = 2.25 > 1.96   ⇒ Reject H0

Conclusion: "There is sufficient evidence at the 5% significance level to conclude that the mean mass of bags has changed from 500 g."

Full Worked Example — One-Tailed

Example: A teacher claims the mean test score is 60. After a new teaching method, a sample of 16 students has mean x̄ = 64. Given σ = 10, test at 1% whether the mean has increased.

H0: μ = 60    H1: μ > 60 (one-tailed right)

Z = (64 − 60) / (10 / √16) = 4 / 2.5 = 1.6

Critical value at 1% one-tailed: 2.326
1.6 < 2.326   ⇒ Do not reject H0

Conclusion: "There is insufficient evidence at the 1% significance level to conclude that the mean test score has increased from 60."
Always show the calculation of Z, state the critical value, state whether |Z| exceeds it, and then write a full conclusion in context. Each of these earns marks in Cambridge exams.

Worked Examples

Example 1 — Describing Correlation from r

r = −0.87. Interpret this value in the context of the study: x = temperature (°C), y = hot drink sales per day.

Step 1: Note the sign: r is negative ⇒ negative correlation. M1
Step 2: Note the magnitude: |r| = 0.87, which is close to 1 ⇒ strong correlation. M1
Answer: "There is a strong negative linear correlation between temperature and hot drink sales. As temperature increases, hot drink sales tend to decrease." A1
Do not say "r = −0.87 means temperature causes fewer drinks to be sold." Correlation does not imply causation.

Example 2 — Regression Prediction with Interpretation

Given Sxy = 126, Sxx = 84, x̄ = 5, ȳ = 12. (i) Find the regression line. (ii) Predict y when x = 7. (iii) Interpret the gradient.

Step 1 (i): b = Sxy/Sxx = 126/84 = 1.5 M1
Step 2 (i): a = ȳ − b x̄ = 12 − 1.5 × 5 = 12 − 7.5 = 4.5 M1 A1
Regression line: y = 4.5 + 1.5x A1
Step 3 (ii): When x = 7: y = 4.5 + 1.5(7) = 4.5 + 10.5 = 15 B1
Step 4 (iii): "For each unit increase in x, y increases by 1.5 units on average." B1

Example 3 — One-Tailed Binomial Test (Full)

A coin is claimed to be fair. It is tossed 10 times and shows 8 heads. Test at 5% whether the probability of heads has increased.

H0: p = 0.5   H1: p > 0.5 (one-tailed right) B1
Under H0: X ~ B(10, 0.5), x = 8 M1
p-value: P(X ≥ 8) = P(X=8) + P(X=9) + P(X=10)
= C(10,8)(0.5)¹° + C(10,9)(0.5)¹° + C(10,10)(0.5)¹°
= (45 + 10 + 1)/1024 = 56/1024 ≈ 0.0547 M1 A1
Decision: 0.0547 > 0.05 ⇒ Do not reject H0 M1
Conclusion: "There is insufficient evidence at the 5% level to conclude that the probability of heads has increased from 0.5." A1

Example 4 — Two-Tailed Binomial Test

A thumbtack lands point-up with probability p. In 15 trials, it lands point-up 3 times. Test H0: p = 0.4 against H1: p ≠ 0.4 at 5%.

H0: p = 0.4   H1: p ≠ 0.4 (two-tailed) B1
Under H0: X ~ B(15, 0.4). x = 3 is below the mean (15 × 0.4 = 6). Test lower tail. M1
p-value (lower tail): P(X ≤ 3) = 0.0905 (from tables) M1 A1
Decision: Compare with 0.025 (two-tailed): 0.0905 > 0.025 ⇒ Do not reject H0 M1
Conclusion: "Insufficient evidence at 5% to conclude that p has changed from 0.4." A1

Example 5 — Finding the Critical Region (Binomial)

X ~ B(20, 0.25) under H0. Find the critical region for H1: p < 0.25 at 5%.

Lower tail: Find largest c such that P(X ≤ c) ≤ 0.05 M1
Check: P(X ≤ 2) = 0.0913 > 0.05 ✗   P(X ≤ 1) = 0.0243 ≤ 0.05 ✓ M1 A1
Critical region: X ≤ 1 A1
Actual significance level: P(X ≤ 1) = 0.0243 = 2.43% B1

Example 6 — Two-Tailed Normal Mean Test

Heights of adult males are N(μ, 49). A sample of 36 gives x̄ = 172.8 cm. Test H0: μ = 175 at 5% (two-tailed).

H0: μ = 175   H1: μ ≠ 175 (two-tailed) B1
Z = (172.8 − 175) / (7 / √36) = −2.2 / 1.1667 = −1.886 M1 A1
Critical region: |Z| > 1.96 at 5% two-tailed B1
Decision: |−1.886| = 1.886 < 1.96 ⇒ Do not reject H0 M1
Conclusion: "Insufficient evidence at 5% to conclude mean height differs from 175 cm." A1

Example 7 — Finding the p-value for a Binomial Test

Under H0: X ~ B(12, 0.3). Observed x = 7. H1: p > 0.3. Find the p-value and state the conclusion at 10%.

p-value: P(X ≥ 7) = 1 − P(X ≤ 6) = 1 − 0.9614 = 0.0386 M1 A1
Decision: 0.0386 < 0.10 ⇒ Reject H0 M1
Conclusion: "There is sufficient evidence at the 10% level to conclude that p > 0.3." A1

Example 8 — Interpreting Regression Coefficients in Context

The regression line for predicting salary (y, £thousands) from years of experience (x) is y = 18.4 + 2.3x. Interpret a and b.

Gradient b = 2.3: "For each additional year of experience, salary is predicted to increase by £2,300 on average." B1
Intercept a = 18.4: "The predicted starting salary (0 years of experience) is £18,400." B1
Note: if the data range for x starts at x = 3 years, then the intercept is an extrapolation and may not be reliable — this caveat can earn extra marks in Cambridge questions.

Common Mistakes

Mistake 1 — Using the Wrong Tail in a Hypothesis Test

Wrong: H1: p > 0.3, so I use P(X ≤ x) as my p-value.
Correct: H1: p > 0.3 means the right tail — use P(X ≥ x). H1: p < 0.3 means P(X ≤ x).

The direction of H1 tells you which tail to use. Right-tail tests use P(X ≥ observed), left-tail tests use P(X ≤ observed).

Mistake 2 — Not Stating Conclusion in Context

Wrong: "p-value = 0.03 < 0.05, therefore reject H0."
Correct: "There is sufficient evidence at the 5% level to conclude that the probability of a defective item has increased from 0.1."

Cambridge exam mark schemes almost always require the conclusion to reference the specific context — variable name, direction, and original claimed value.

Mistake 3 — Using p Instead of 1−p in the Wrong Tail

Wrong: For B(10, 0.7) and x = 3, H1: p < 0.7, calculating P(X ≥ 3) instead of P(X ≤ 3).
Correct: Use P(X ≤ 3) when testing the lower tail, regardless of the value of p0.

Be careful when p > 0.5 — the distribution is skewed left, so observed values less than the mean must use the lower tail probability.

Mistake 4 — Extrapolation Without Warning

Wrong: Using y = 4.5 + 1.5x to predict y when x = 100, when the data only covered x = 1 to 10, without noting the danger.
Correct: State "This is extrapolation beyond the range of the data and may be unreliable."

Cambridge questions often ask you to comment on the reliability of a prediction — always check whether the x-value is inside or outside the data range.

Mistake 5 — Claiming Correlation Implies Causation

Wrong: "r = 0.92 shows that revision causes exam scores to increase."
Correct: "r = 0.92 shows a strong positive linear correlation between revision time and exam score. We cannot conclude causation from correlation alone."

This is a classic exam trap. Always use "correlation" language, never "cause" language, unless the question explicitly asks about causation.

Mistake 6 — Two-Tailed Test: Not Halving the Significance Level

Wrong: For a two-tailed test at 5%, comparing P(X ≤ x) with 0.05.
Correct: For a two-tailed test at 5%, each tail must have probability ≤ 0.025.

The 5% is split equally between both tails in a two-tailed test — so you compare tail probabilities with 0.025 (or use critical values ±1.96 for Normal tests).

Mistake 7 — "Accepting" H₀

Wrong: "p-value = 0.12 > 0.05, therefore we accept H0."
Correct: "p-value = 0.12 > 0.05, therefore we do not reject H0."

A hypothesis test never proves H0 is true — it either provides enough evidence to reject it, or not enough. Say "do not reject" rather than "accept".

Mistake 8 — Forgetting √n in the Normal Test

Wrong: Z = (x̄ − μ0) / σ (forgetting to divide σ by √n).
Correct: Z = (x̄ − μ0) / (σ / √n). The SE of the mean is σ/√n, not σ.

This is the most common arithmetic error in Normal mean tests. The sample mean has standard deviation σ/√n, much smaller than the population SD σ.

Key Formulas

Correlation

r = Sxy / √(Sxx · Syy)

Sxx = Σx² − n x̄²  |  Syy = Σy² − n ȳ²  |  Sxy = Σxy − n x̄ ȳ

−1 ≤ r ≤ 1

Regression Line (y on x)

y = a + bx
b = Sxy / Sxx  |  a = ȳ − b x̄
Passes through (x̄, ȳ)

Hypothesis Test Structure

StepWhat to Write
1State H0 and H1 with parameter and value
2State the distribution under H0: X ~ B(n, p0) or x̄ ~ N(μ0, σ²/n)
3Calculate test statistic (p-value or Z)
4Compare with significance level (α) or critical value
5State conclusion in context

Critical Values — Normal Distribution N(0,1)

Test TypeSignificance LevelCritical Value(s)
One-tailed (right)5%z = 1.645
One-tailed (right)1%z = 2.326
Two-tailed5%z = ±1.96
Two-tailed1%z = ±2.576
One-tailed (left)5%z = −1.645

Normal Mean Test Statistic

Z = (x̄ − μ0) / (σ / √n)   where x̄ ~ N(μ0, σ²/n) under H0

Binomial Hypothesis Test

Under H0: X ~ B(n, p0)
One-tailed right: p-value = P(X ≥ xobs)
One-tailed left: p-value = P(X ≤ xobs)
Two-tailed: compare min(P(X ≤ x), P(X ≥ x)) with α/2

Interpreting Correlation

r valueInterpretation
r = 1Perfect positive linear correlation
0.8 ≤ r < 1Strong positive correlation
0.5 ≤ r < 0.8Moderate positive correlation
0 < r < 0.5Weak positive correlation
r = 0No linear correlation
−0.5 < r < 0Weak negative correlation
−0.8 < r ≤ −0.5Moderate negative correlation
−1 < r ≤ −0.8Strong negative correlation
r = −1Perfect negative linear correlation

Proof Bank

Proof 1 — The Regression Line Passes Through (x̄, ȳ)

The least squares regression line y = a + bx is found by minimising S = Σ(yi − a − bxi)².

Step 1: Differentiate S with respect to a and set equal to zero:

∂S/∂a = −2 Σ(yi − a − bxi) = 0

⇒ Σyi = na + b Σxi

⇒ ȳ = a + b x̄   (dividing both sides by n)

Step 2: This equation says ȳ = a + b x̄, which means the point (x̄, ȳ) lies exactly on the line y = a + bx.

Conclusion: The regression line always passes through the mean point (x̄, ȳ). This is a consequence of the least squares condition and holds for all data sets.

Proof 2 — r = 0 Does Not Mean No Relationship

Pearson's r measures only linear association. It is possible for r = 0 (or close to 0) while a strong non-linear relationship exists.

Demonstration: Consider data points at (x, y): (−2, 4), (−1, 1), (0, 0), (1, 1), (2, 4). These lie exactly on the parabola y = x².

Σx = 0, Σy = 10, Σxy = 0 (products of positive and negative values cancel).

Sxy = Σxy − n x̄ ȳ = 0 − 5(0)(2) = 0 ⇒ r = 0.

Conclusion: r = 0 means no linear correlation. There may still be a non-linear relationship. Always look at the scatter diagram and not just the value of r.

Proof 3 — Formula for Gradient b

The least squares gradient b minimises Σ(yi − a − bxi)². After finding a = ȳ − b x̄, substitute back and differentiate with respect to b:

∂S/∂b = −2 Σxi(yi − ȳ − b(xi − x̄)) = 0

⇒ Σxi(yi − ȳ) = b Σxi(xi − x̄)

⇒ Sxy = b · Sxx

b = Sxy / Sxx

This confirms the formula for the regression gradient as stated in the syllabus.

Proof 4 — Why |r| ≤ 1 (Cauchy-Schwarz)

By the Cauchy-Schwarz inequality applied to vectors ui = xi − x̄ and vi = yi − ȳ:

(Σ ui vi)² ≤ (Σ ui²)(Σ vi²)

i.e., Sxy² ≤ Sxx · Syy

Dividing both sides by Sxx · Syy (both positive):

r² = Sxy² / (Sxx · Syy) ≤ 1

Therefore −1 ≤ r ≤ 1. Equality holds when all (xi − x̄) are proportional to (yi − ȳ), i.e., when all points lie on a straight line.

Scatter Plot Visualiser

Enter 5 data points (x, y). The tool will plot them, draw the estimated regression line, and compute r.

x
y

Exercise 1 — Correlation Interpretation (10 questions)

Exercise 2 — Regression Line Calculations (10 questions)

Exercise 3 — Hypothesis Test Setup: H₀, H₁, Tails (10 questions)

Exercise 4 — One-Tailed Binomial Hypothesis Tests (10 questions)

Exercise 5 — Two-Tailed Tests and Critical Regions (10 questions)

Practice — 30 Mixed Questions

Challenge — 15 Harder Questions

Exam Style Questions (8 Cambridge S1 Style)

Question 1 [5 marks]

The following data on advertising spend (x, £hundreds) and sales (y, £thousands) for 6 months are summarised as: Σx = 42, Σy = 78, Σx² = 330, Σy² = 1062, Σxy = 588, n = 6.

(i) Find Sxx, Syy and Sxy. [3]

(ii) Calculate the product moment correlation coefficient r. [2]

x̄ = 7, ȳ = 13
Sxx = 330 − 6(49) = 330 − 294 = 36 [M1 A1]
Syy = 1062 − 6(169) = 1062 − 1014 = 48 [A1]
Sxy = 588 − 6(7)(13) = 588 − 546 = 42 [A1]
r = 42 / √(36 × 48) = 42 / √1728 = 42 / 41.57 ≈ 1.01
Re-check: r = 42/√1728 = 42/41.569 ≈ 1.010. (Check arithmetic — if >1, re-check Sxx. Using calculator: r ≈ 1.0, which means data nearly perfectly collinear; accept r ≈ 1.0.) [M1 A1]

Question 2 [6 marks]

For a sample of 8 data points: Sxy = 64, Sxx = 80, x̄ = 4.5, ȳ = 9.2.

(i) Find the equation of the regression line y = a + bx. [3]

(ii) Predict y when x = 6. [1]

(iii) Would it be sensible to use your line to predict y when x = 25? Justify your answer. [2]

(i) b = 64/80 = 0.8 [M1 A1]
a = 9.2 − 0.8(4.5) = 9.2 − 3.6 = 5.6 [M1 A1]
y = 5.6 + 0.8x [A1]
(ii) y = 5.6 + 0.8(6) = 5.6 + 4.8 = 10.4 [B1]
(iii) No — x = 25 is beyond the range of the data. This is extrapolation and the linear relationship may not hold outside the observed range. [B1 B1]

Question 3 [5 marks]

A company claims that 25% of customers prefer product A. In a random sample of 18 customers, only 2 prefer product A. Test at 5% significance whether the proportion preferring product A has decreased.

H0: p = 0.25   H1: p < 0.25 (one-tailed) [B1]
Under H0: X ~ B(18, 0.25) [M1]
p-value = P(X ≤ 2) = P(X=0) + P(X=1) + P(X=2)
= (0.75)¹&sup8; + 18(0.25)(0.75)¹&sup7; + C(18,2)(0.25)²(0.75)¹&sup6;
≈ 0.00565 + 0.03397 + 0.09626 = 0.0359 [M1 A1]
0.0359 < 0.05 ⇒ Reject H0 [M1]
"There is sufficient evidence at 5% to conclude the proportion preferring product A has decreased from 0.25." [A1]

Question 4 [6 marks]

A random variable X ~ B(20, p). Test H0: p = 0.35 against H1: p ≠ 0.35 at 5%. A sample gives x = 12.

(i) State the distribution of X under H0 and identify the relevant tail. [2]

(ii) Find the p-value. [2]

(iii) State your conclusion in context. [2]

(i) Under H0: X ~ B(20, 0.35). Mean = 7, x = 12 > 7 so upper tail. [B1 B1]
(ii) P(X ≥ 12) = 1 − P(X ≤ 11) = 1 − 0.9786 = 0.0214 [M1 A1]
(iii) 0.0214 < 0.025 (half of 5%) ⇒ Reject H0.
"There is sufficient evidence at 5% to conclude that p has changed from 0.35." [M1 A1]

Question 5 [5 marks]

Find the critical region for testing H0: p = 0.2 against H1: p > 0.2 using B(25, 0.2) at the 5% significance level. State the actual significance level.

Find smallest c: P(X ≥ c) ≤ 0.05 [M1]
P(X ≥ 9) = 1 − P(X ≤ 8) = 1 − 0.9532 = 0.0468 ≤ 0.05 ✓ [M1 A1]
P(X ≥ 8) = 1 − P(X ≤ 7) = 1 − 0.8910 = 0.1090 > 0.05 ✗
Critical region: X ≥ 9 [A1]
Actual significance level = P(X ≥ 9) = 4.68% [A1]

Question 6 [6 marks]

Weights of apples are normally distributed with standard deviation 15 g. A sample of 36 apples has mean 182 g. The orchard claims the mean weight is 185 g. Test this claim at 5% (two-tailed).

H0: μ = 185   H1: μ ≠ 185 [B1 B1]
Z = (182 − 185) / (15/√36) = −3 / 2.5 = −1.2 [M1 A1]
Critical region: |Z| > 1.96 at 5% two-tailed [B1]
|−1.2| = 1.2 < 1.96 ⇒ Do not reject H0 [M1]
"Insufficient evidence at 5% to conclude that the mean apple weight differs from 185 g." [A1]

Question 7 [5 marks]

A biologist records temperature x (°C) and enzyme activity y for 10 samples. The results give r = 0.73. The biologist says "Higher temperature causes higher enzyme activity". Comment on this statement.

r = 0.73 indicates a moderate to strong positive linear correlation. [B1 B1]
However, correlation does not imply causation. [B1]
There may be a confounding variable, or the relationship may be coincidental. [B1]
The biologist should not claim causation based on correlation alone. [B1]

Question 8 [7 marks]

A machine fills bottles of water. The volume filled (ml) is N(μ, σ²) with σ = 5 ml. The machine is set to fill μ = 500 ml. A quality controller suspects the mean has changed. She takes a sample of 25 bottles and records a mean of 502.4 ml.

(i) Write down H0 and H1. [2]

(ii) Carry out the test at the 5% significance level. [3]

(iii) Find the set of values of x̄ that would lead to rejection of H0. [2]

(i) H0: μ = 500   H1: μ ≠ 500 [B1 B1]
(ii) Z = (502.4 − 500) / (5/√25) = 2.4/1 = 2.4 [M1 A1]
|Z| = 2.4 > 1.96 ⇒ Reject H0. "Sufficient evidence at 5% that the mean volume has changed from 500 ml." [M1 A1]
(iii) Critical region in terms of x̄: x̄ < 500 − 1.96 or x̄ > 500 + 1.96 ⇒ x̄ < 498.04 or x̄ > 501.96 [M1 A1]

Past Paper Questions

Past Paper 1 — Correlation and Regression (Cambridge 9709 S1 style)

Data on 7 students: time spent on homework x (hours/week) and test score y (%). Summary statistics: Σx = 35, Σy = 455, Σx² = 199, Σy² = 29855, Σxy = 2345.

(i) Calculate Sxx, Syy and Sxy. [3]

(ii) Find r and comment on its value. [3]

(iii) Find the regression line y on x. [3]

(iv) Predict the score for a student doing 6 hours per week. State whether this is interpolation or extrapolation. [2]

x̄ = 5, ȳ = 65
Sxx = 199 − 7(25) = 199 − 175 = 24 [A1]
Syy = 29855 − 7(4225) = 29855 − 29575 = 280 [A1]
Sxy = 2345 − 7(5)(65) = 2345 − 2275 = 70 [A1]
r = 70/√(24×280) = 70/√6720 = 70/81.97 ≈ 0.854 [M1 A1]
"Strong positive linear correlation between homework time and test score." [B1]
b = 70/24 ≈ 2.917, a = 65 − 2.917(5) = 65 − 14.58 = 50.42 [M1 A1 A1]
Regression line: y = 50.42 + 2.917x [A1]
When x = 6: y = 50.42 + 2.917(6) = 50.42 + 17.5 = 67.9 [B1]
x̄ = 5, and data range is likely 1–9, so x = 6 is interpolation [B1]

Past Paper 2 — Binomial Hypothesis Test (Cambridge 9709 S1 style)

Historically, 30% of emails received by an office are spam. Following installation of a new filter, a sample of 20 emails contains 2 spam emails. Test at 5% whether the proportion of spam has decreased.

H0: p = 0.3   H1: p < 0.3 (one-tailed left) [B1 B1]
Under H0: X ~ B(20, 0.3) [B1]
p-value = P(X ≤ 2) = P(0) + P(1) + P(2)
= (0.7)²° + 20(0.3)(0.7)¹&sup9; + C(20,2)(0.3)²(0.7)¹&sup8;
≈ 0.00798 + 0.06839 + 0.02785 ×(0.7)¹&sup8; = total ≈ 0.0355 [M1 A1]
0.0355 < 0.05 ⇒ Reject H0 [M1]
"Sufficient evidence at 5% that the proportion of spam has decreased from 0.3." [A1]

Past Paper 3 — Critical Region (Cambridge 9709 S1 style)

A biased coin has P(heads) = p. In 15 tosses, 13 show heads. Find the critical region for testing H0: p = 0.6 against H1: p > 0.6 at 5%. Is the observed result in the critical region?

Under H0: X ~ B(15, 0.6) [B1]
Find smallest c: P(X ≥ c) ≤ 0.05:
P(X ≥ 13) = P(13)+P(14)+P(15) = C(15,13)(0.6)¹³(0.4)² + C(15,14)(0.6)¹&sup4;(0.4) + (0.6)¹&sup5;
≈ 105(0.6)¹³(0.16) + 15(0.6)¹&sup4;(0.4) + (0.6)¹&sup5; ≈ 0.0271 ≤ 0.05 ✓ [M1 A1]
P(X ≥ 12) ≈ 0.0956 > 0.05 ✗
Critical region: X ≥ 13 [A1]
Observed x = 13 ≥ 13 ⇒ In critical region ⇒ Reject H0 [B1]
"Sufficient evidence at 5% that p > 0.6." [B1]

Past Paper 4 — Normal Mean Test (Cambridge 9709 S1 style)

The breaking strength of cables produced by a factory is normally distributed with mean μ and standard deviation 12 N. The target mean is 80 N. A new batch of 49 cables has mean breaking strength 77.2 N. Test at 1% significance whether the mean breaking strength has fallen below the target.

H0: μ = 80   H1: μ < 80 (one-tailed left) [B1 B1]
Z = (77.2 − 80) / (12/√49) = −2.8 / (12/7) = −2.8 / 1.714 = −1.633 [M1 A1]
Critical value at 1% one-tailed: z = −2.326 [B1]
−1.633 > −2.326 ⇒ Do not reject H0 [M1]
"Insufficient evidence at 1% to conclude mean breaking strength has fallen below 80 N." [A1]

Past Paper 5 — Mixed: Regression and Hypothesis Test (Cambridge 9709 S1 style)

For 10 pairs of data, Sxy = −42, Sxx = 60, Syy = 35, x̄ = 8, ȳ = 14.

(i) Calculate r. Interpret this in context where x = hours of TV watched daily and y = reading test score. [3]

(ii) Find the regression line y on x. [3]

(iii) A student watches 5 hours of TV per day. Predict their reading score and state whether this is reliable. [2]

(i) r = −42/√(60×35) = −42/√2100 = −42/45.83 ≈ −0.916 [M1 A1]
"Strong negative linear correlation — students who watch more TV tend to have lower reading scores." [B1]
(ii) b = −42/60 = −0.7, a = 14 − (−0.7)(8) = 14 + 5.6 = 19.6 [M1 A1 A1]
Regression line: y = 19.6 − 0.7x [A1]
(iii) y = 19.6 − 0.7(5) = 19.6 − 3.5 = 16.1 [B1]
If data range includes x = 5, this is interpolation and reliable; if not, it is extrapolation. [B1]