Further Statistics | FractionRush A-Level Statistics 1

Welcome to Further Statistics!

This topic covers the final and most powerful ideas in Cambridge A-Level Statistics 1 (9709): measuring the strength and direction of relationships between variables using correlation, finding the line of best fit using linear regression, and drawing evidence-based conclusions using hypothesis testing. These tools are used throughout science, economics, medicine, and data analysis.

r = S_xy / √(S_xx · S_yy) | y = a + bx, b = S_xy/S_xx | Z = (x̄ − μ₀) / (σ/√n)

Learning Objectives

Calculate and interpret Pearson's product moment correlation coefficient r
Understand that −1 ≤ r ≤ 1 and interpret strength/direction of correlation
Identify that correlation does not imply causation
Find the equation of the regression line y = a + bx using S_xy and S_xx
Interpret the gradient b and intercept a in context
Understand the danger of extrapolation outside the data range
Set up null and alternative hypotheses H₀ and H₁
Choose one-tailed or two-tailed tests and identify the critical region
Conduct hypothesis tests for the probability parameter p in B(n, p)
Conduct hypothesis tests for the population mean μ using the Normal distribution
Write conclusions in context at a given significance level

Correlation r

Measures linear association: r = ±1 perfect; r = 0 no linear correlation

Scatter Diagrams

Visual display of bivariate data — look for pattern, direction, and outliers

Regression Line

y = a + bx (least squares); passes through (x̄, ȳ)

Extrapolation

Using the line outside the data range — unreliable and not recommended

H₀ and H₁

Null hypothesis (assumed true) vs alternative (what we want to test)

Critical Region

Values of the test statistic that lead to rejecting H₀

Binomial Test

Test p using B(n, p₀) — compare tail probability with α

Normal Mean Test

Standardise x̄ using Z = (x̄ − μ₀) / (σ/√n); compare with z_α

Learn 1 — Correlation

What Is Correlation?

Correlation measures the strength and direction of a linear relationship between two variables x and y. It is quantified by Pearson's product moment correlation coefficient r, which always lies between −1 and 1 (inclusive).

−1 ≤ r ≤ 1

Interpreting r

r = +1: Perfect positive linear correlation — as x increases, y increases exactly along a straight line.
r = −1: Perfect negative linear correlation — as x increases, y decreases exactly along a straight line.
r = 0: No linear correlation — knowing x tells you nothing about y (on a linear basis).

Strength guidelines (approximate):
• |r| ≥ 0.8 — strong correlation
• 0.5 ≤ |r| < 0.8 — moderate correlation
• |r| < 0.5 — weak correlation

Direction: r > 0 means positive correlation; r < 0 means negative correlation.

Calculating r

The formula uses three summary statistics: S_xx, S_yy, and S_xy.

r = S_xy / √(S_xx · S_yy)

where S_xx = Σx² − n x̄² | S_yy = Σy² − n ȳ² | S_xy = Σxy − n x̄ ȳ

Example: Five data points give S_xy = 48, S_xx = 60, S_yy = 50. Find r.

r = 48 / √(60 × 50) = 48 / √3000 = 48 / 54.77 ≈ 0.877

Conclusion: strong positive linear correlation between x and y.

Scatter Diagrams

Before calculating r, always draw (or inspect) a scatter diagram — a graph where each data point (x, y) is plotted as a dot. Look for:

• Direction: points rising left-to-right (positive) or falling (negative)?
• Linearity: do points cluster around a straight line, or a curve?
• Strength: how tightly do points cluster around the trend line?
• Outliers: any points far from the main pattern? These may heavily affect r.

Correlation Does NOT Imply Causation

Critical concept: Even if r is close to ±1, this does NOT mean x causes y to change. Both variables might be influenced by a third (confounding) variable, or the correlation could be coincidental.

Example: Ice cream sales and drowning rates are positively correlated — but ice cream does not cause drowning. Both increase in hot weather (a confounding variable).

Interpreting r in Context

When asked to interpret r in an exam:

1. State the strength: strong / moderate / weak.
2. State the direction: positive / negative.
3. State the variables: "there is a strong positive linear correlation between revision time and exam score."
4. Do NOT say "revision time causes exam score to increase" unless the question asks about causation specifically.

In the Cambridge S1 exam, you are often given r and asked to "comment on" or "interpret" it. Always use the context of the question, state the direction and strength, and never claim causation unless instructed.

Learn 2 — Linear Regression

The Regression Line y on x

When there is a linear relationship between x and y, we can find the least squares regression line y = a + bx. This line minimises the sum of squared vertical distances from each point to the line — giving the "best fit" straight line.

y = a + bx | b = S_xy / S_xx | a = ȳ − b x̄

Calculating b and a

Step 1: Calculate S_xy = Σxy − n x̄ ȳ and S_xx = Σx² − n x̄²
Step 2: b = S_xy / S_xx
Step 3: a = ȳ − b x̄
Step 4: Write the equation as y = a + bx

Example: n = 6, Σx = 30, Σy = 48, Σx² = 182, Σxy = 262.

x̄ = 30/6 = 5, ȳ = 48/6 = 8
S_xx = 182 − 6(5)² = 182 − 150 = 32
S_xy = 262 − 6(5)(8) = 262 − 240 = 22
b = 22/32 = 0.6875
a = 8 − 0.6875 × 5 = 8 − 3.4375 = 4.5625
Regression line: y = 4.5625 + 0.6875x

The Regression Line Passes Through (x̄, ȳ)

The least squares regression line always passes through the mean point (x̄, ȳ). This is because a = ȳ − b x̄, so substituting x = x̄ gives y = a + b x̄ = ȳ. Use this as a check.

Always verify: does your regression line pass through (x̄, ȳ)? Substitute x = x̄ — if you don't get ȳ, you have made an arithmetic error in calculating a.

Using the Regression Line for Prediction

Example (continued): Predict y when x = 7.
y = 4.5625 + 0.6875(7) = 4.5625 + 4.8125 = 9.375

Interpreting the Gradient b

The gradient b means: for each unit increase in x, y increases by b units (on average).

Example: If x = study hours and y = exam score, and b = 3.2, then:
"For each additional hour of study, the predicted exam score increases by 3.2 marks."

Interpreting the Intercept a

The intercept a is the predicted value of y when x = 0.

Caution: This may not have a meaningful interpretation if x = 0 is outside the data range (e.g., if x = height, x = 0 is nonsensical).

Extrapolation — The Danger Zone

Extrapolation means using the regression line to predict y for x values outside the range of the original data. This is unreliable because:
• The linear relationship may not hold beyond the data range.
• There may be natural limits or non-linearities not captured.

Interpolation (predicting within the data range) is generally reliable.

Learn 3 — Hypothesis Testing Introduction

What Is Hypothesis Testing?

A hypothesis test uses sample data to decide whether there is sufficient evidence to reject an assumption about a population. It provides a formal framework for drawing conclusions under uncertainty.

Step 1 — State the Hypotheses

Null Hypothesis H₀: The "default" assumption — what we assume to be true unless the data provides strong evidence against it. Always contains an equals sign (e.g., H₀: p = 0.3).

Alternative Hypothesis H₁: What we are trying to find evidence for. Dictates the direction of the test.

One-Tailed vs Two-Tailed Tests

One-tailed test (right): H₁: p > p₀ — we suspect the true value is higher than assumed.
One-tailed test (left): H₁: p < p₀ — we suspect the true value is lower than assumed.
Two-tailed test: H₁: p ≠ p₀ — we suspect the true value has changed (direction unknown).

How to decide: Read the question carefully. Words like "has increased", "is higher", "more likely" suggest a one-tailed test. "Has changed", "is different" suggest two-tailed.

Step 2 — Choose Significance Level α

The significance level α is the probability of incorrectly rejecting H₀ when it is actually true (a Type I error). Common values: α = 0.05 (5%) or α = 0.01 (1%).

Step 3 — Find the Critical Region

The critical region is the set of values of the test statistic that would lead to rejecting H₀.

For one-tailed tests at 5%: find the smallest critical value c such that P(X ≥ c) ≤ 0.05 (right tail) or P(X ≤ c) ≤ 0.05 (left tail).

For two-tailed tests at 5%: split the 5% equally — find critical regions in both tails, each with probability ≤ 2.5%.

Step 4 — Compare Test Statistic with Critical Region

Calculate the probability of observing a result as extreme as (or more extreme than) the data, assuming H₀ is true. This is the p-value.

If p-value ≤ α, reject H₀. If p-value > α, do not reject H₀.

Step 5 — Write Conclusion in Context

If you reject H₀: "There is sufficient evidence at the X% significance level to reject H₀. The data suggests [H₁ stated in context]."

If you do not reject H₀: "There is insufficient evidence at the X% significance level to reject H₀. The data is consistent with [H₀ stated in context]."

Never say "we accept H₀" — we either "reject H₀" or "do not reject H₀". Failing to reject H₀ does not prove it is true; it just means the evidence against it was insufficient.

In Cambridge S1, marks are lost by failing to state the conclusion in context. Always refer back to the specific situation in the question — don't just say "reject H₀", say what that means for the problem (e.g., "The probability that the coin shows heads has changed").

Learn 4 — Testing Binomial Probability

Setting Up the Test

When we observe X successes in n trials and want to test a claimed probability p₀, we use the model X ~ B(n, p₀) under H₀.

H₀: p = p₀ | H₁: p > p₀ (or < or ≠)

One-Tailed Test — Right Tail (H₁: p > p₀)

Example: A seed company claims germination rate is 0.4. A gardener plants 20 seeds; 12 germinate. Test at 5% whether the germination rate has increased.

H₀: p = 0.4 H₁: p > 0.4 (one-tailed right)

Under H₀: X ~ B(20, 0.4)
Test statistic: x = 12
p-value = P(X ≥ 12) = 1 − P(X ≤ 11)
Using tables: P(X ≤ 11) = 0.9435, so P(X ≥ 12) = 1 − 0.9435 = 0.0565

Since 0.0565 > 0.05, we do not reject H₀.
Conclusion: "There is insufficient evidence at the 5% level to conclude that the germination rate has increased from 0.4."

One-Tailed Test — Left Tail (H₁: p < p₀)

Example: A drug is claimed to cure 70% of patients (p = 0.7). In a trial of 15 patients, only 8 are cured. Test at 5% whether the cure rate has decreased.

H₀: p = 0.7 H₁: p < 0.7 (one-tailed left)

Under H₀: X ~ B(15, 0.7)
p-value = P(X ≤ 8)
Using tables: P(X ≤ 8) = 0.0348

Since 0.0348 < 0.05, we reject H₀.
Conclusion: "There is sufficient evidence at the 5% significance level that the cure rate has decreased from 0.7."

Two-Tailed Test (H₁: p ≠ p₀)

For a two-tailed test at 5%, compare each tail probability with 2.5% (i.e., 0.025).

Example: A fair coin is tossed 20 times; 14 heads. Test at 5% whether the coin is biased.

H₀: p = 0.5 H₁: p ≠ 0.5 (two-tailed)

Under H₀: X ~ B(20, 0.5)
x = 14, which is above the mean (10), so test the upper tail:
P(X ≥ 14) = 1 − P(X ≤ 13) = 1 − 0.9423 = 0.0577

Since 0.0577 > 0.025, we do not reject H₀.
Conclusion: "There is insufficient evidence at the 5% level to conclude that the coin is biased."

Finding the Critical Region

Instead of comparing a specific value, we find the range of values that would lead to rejection.

Example: X ~ B(20, 0.3) under H₀. Find the critical region for H₁: p > 0.3 at 5%.

Find smallest c such that P(X ≥ c) ≤ 0.05:
P(X ≥ 10) = 1 − P(X ≤ 9) = 1 − 0.9520 = 0.0480 ≤ 0.05 ✓
P(X ≥ 9) = 1 − P(X ≤ 8) = 1 − 0.8867 = 0.1133 > 0.05 ✗

Critical region: X ≥ 10

For two-tailed critical regions, find the smallest c_upper such that P(X ≥ c_upper) ≤ 0.025, AND the largest c_lower such that P(X ≤ c_lower) ≤ 0.025. The critical region is X ≤ c_lower or X ≥ c_upper.

Learn 5 — Testing the Normal Mean

When to Use This Test

When a population is normally distributed with known variance σ², and we take a sample of size n, the sample mean x̄ has distribution:

x̄ ~ N(μ, σ²/n)

We can test whether the true population mean μ equals a specific value μ₀.

Setting Up the Test

H₀: μ = μ₀
H₁: μ ≠ μ₀ (two-tailed) or μ > μ₀ or μ < μ₀ (one-tailed)

The Test Statistic Z

Under H₀, standardise the sample mean:

Z = (x̄ − μ₀) / (σ / √n)

Under H₀, Z ~ N(0,1). Compare Z with critical values from the standard Normal table.

Critical Values

Test type	Level α	Critical value	Reject H₀ if
One-tailed (right)	5%	1.645	Z > 1.645
One-tailed (left)	5%	−1.645	Z < −1.645
Two-tailed	5%	±1.96	\|Z\| > 1.96
One-tailed (right)	1%	2.326	Z > 2.326
Two-tailed	1%	±2.576	\|Z\| > 2.576

Full Worked Example — Two-Tailed

Example: The masses of bags of flour are normally distributed with σ = 8 g. The manufacturer claims μ = 500 g. A sample of 25 bags has mean x̄ = 496.4 g. Test at 5% whether the mean has changed.

H₀: μ = 500 H₁: μ ≠ 500 (two-tailed)

Z = (496.4 − 500) / (8 / √25) = −3.6 / 1.6 = −2.25

Critical region: |Z| > 1.96
|−2.25| = 2.25 > 1.96 ⇒ Reject H₀

Conclusion: "There is sufficient evidence at the 5% significance level to conclude that the mean mass of bags has changed from 500 g."

Full Worked Example — One-Tailed

Example: A teacher claims the mean test score is 60. After a new teaching method, a sample of 16 students has mean x̄ = 64. Given σ = 10, test at 1% whether the mean has increased.

H₀: μ = 60 H₁: μ > 60 (one-tailed right)

Z = (64 − 60) / (10 / √16) = 4 / 2.5 = 1.6

Critical value at 1% one-tailed: 2.326
1.6 < 2.326 ⇒ Do not reject H₀

Conclusion: "There is insufficient evidence at the 1% significance level to conclude that the mean test score has increased from 60."

Always show the calculation of Z, state the critical value, state whether |Z| exceeds it, and then write a full conclusion in context. Each of these earns marks in Cambridge exams.

Worked Examples

Example 1 — Describing Correlation from r

r = −0.87. Interpret this value in the context of the study: x = temperature (°C), y = hot drink sales per day.

Step 1: Note the sign: r is negative ⇒ negative correlation. M1

Step 2: Note the magnitude: |r| = 0.87, which is close to 1 ⇒ strong correlation. M1

Answer: "There is a strong negative linear correlation between temperature and hot drink sales. As temperature increases, hot drink sales tend to decrease." A1

Do not say "r = −0.87 means temperature causes fewer drinks to be sold." Correlation does not imply causation.

Example 2 — Regression Prediction with Interpretation

Given S_xy = 126, S_xx = 84, x̄ = 5, ȳ = 12. (i) Find the regression line. (ii) Predict y when x = 7. (iii) Interpret the gradient.

Step 1 (i): b = S_xy/S_xx = 126/84 = 1.5 M1

Step 2 (i): a = ȳ − b x̄ = 12 − 1.5 × 5 = 12 − 7.5 = 4.5 M1 A1

Regression line: y = 4.5 + 1.5x A1

Step 3 (ii): When x = 7: y = 4.5 + 1.5(7) = 4.5 + 10.5 = 15 B1

Step 4 (iii): "For each unit increase in x, y increases by 1.5 units on average." B1

Example 3 — One-Tailed Binomial Test (Full)

A coin is claimed to be fair. It is tossed 10 times and shows 8 heads. Test at 5% whether the probability of heads has increased.

H₀: p = 0.5 H₁: p > 0.5 (one-tailed right) B1

Under H₀: X ~ B(10, 0.5), x = 8 M1

p-value: P(X ≥ 8) = P(X=8) + P(X=9) + P(X=10)
= C(10,8)(0.5)¹° + C(10,9)(0.5)¹° + C(10,10)(0.5)¹°
= (45 + 10 + 1)/1024 = 56/1024 ≈ 0.0547 M1 A1

Decision: 0.0547 > 0.05 ⇒ Do not reject H₀ M1

Conclusion: "There is insufficient evidence at the 5% level to conclude that the probability of heads has increased from 0.5." A1

Example 4 — Two-Tailed Binomial Test

A thumbtack lands point-up with probability p. In 15 trials, it lands point-up 3 times. Test H₀: p = 0.4 against H₁: p ≠ 0.4 at 5%.

H₀: p = 0.4 H₁: p ≠ 0.4 (two-tailed) B1

Under H₀: X ~ B(15, 0.4). x = 3 is below the mean (15 × 0.4 = 6). Test lower tail. M1

p-value (lower tail): P(X ≤ 3) = 0.0905 (from tables) M1 A1

Decision: Compare with 0.025 (two-tailed): 0.0905 > 0.025 ⇒ Do not reject H₀ M1

Conclusion: "Insufficient evidence at 5% to conclude that p has changed from 0.4." A1

Example 5 — Finding the Critical Region (Binomial)

X ~ B(20, 0.25) under H₀. Find the critical region for H₁: p < 0.25 at 5%.

Lower tail: Find largest c such that P(X ≤ c) ≤ 0.05 M1

Check: P(X ≤ 2) = 0.0913 > 0.05 ✗ P(X ≤ 1) = 0.0243 ≤ 0.05 ✓ M1 A1

Critical region: X ≤ 1 A1

Actual significance level: P(X ≤ 1) = 0.0243 = 2.43% B1

Example 6 — Two-Tailed Normal Mean Test

Heights of adult males are N(μ, 49). A sample of 36 gives x̄ = 172.8 cm. Test H₀: μ = 175 at 5% (two-tailed).

H₀: μ = 175 H₁: μ ≠ 175 (two-tailed) B1

Z = (172.8 − 175) / (7 / √36) = −2.2 / 1.1667 = −1.886 M1 A1

Critical region: |Z| > 1.96 at 5% two-tailed B1

Decision: |−1.886| = 1.886 < 1.96 ⇒ Do not reject H₀ M1

Conclusion: "Insufficient evidence at 5% to conclude mean height differs from 175 cm." A1

Example 7 — Finding the p-value for a Binomial Test

Under H₀: X ~ B(12, 0.3). Observed x = 7. H₁: p > 0.3. Find the p-value and state the conclusion at 10%.

p-value: P(X ≥ 7) = 1 − P(X ≤ 6) = 1 − 0.9614 = 0.0386 M1 A1

Decision: 0.0386 < 0.10 ⇒ Reject H₀ M1

Conclusion: "There is sufficient evidence at the 10% level to conclude that p > 0.3." A1

Example 8 — Interpreting Regression Coefficients in Context

The regression line for predicting salary (y, £thousands) from years of experience (x) is y = 18.4 + 2.3x. Interpret a and b.

Gradient b = 2.3: "For each additional year of experience, salary is predicted to increase by £2,300 on average." B1

Intercept a = 18.4: "The predicted starting salary (0 years of experience) is £18,400." B1

Note: if the data range for x starts at x = 3 years, then the intercept is an extrapolation and may not be reliable — this caveat can earn extra marks in Cambridge questions.

Common Mistakes

Mistake 1 — Using the Wrong Tail in a Hypothesis Test

Wrong: H₁: p > 0.3, so I use P(X ≤ x) as my p-value.

Correct: H₁: p > 0.3 means the right tail — use P(X ≥ x). H₁: p < 0.3 means P(X ≤ x).

The direction of H₁ tells you which tail to use. Right-tail tests use P(X ≥ observed), left-tail tests use P(X ≤ observed).

Mistake 2 — Not Stating Conclusion in Context

Wrong: "p-value = 0.03 < 0.05, therefore reject H₀."

Correct: "There is sufficient evidence at the 5% level to conclude that the probability of a defective item has increased from 0.1."

Cambridge exam mark schemes almost always require the conclusion to reference the specific context — variable name, direction, and original claimed value.

Mistake 3 — Using p Instead of 1−p in the Wrong Tail

Wrong: For B(10, 0.7) and x = 3, H₁: p < 0.7, calculating P(X ≥ 3) instead of P(X ≤ 3).

Correct: Use P(X ≤ 3) when testing the lower tail, regardless of the value of p₀.

Be careful when p > 0.5 — the distribution is skewed left, so observed values less than the mean must use the lower tail probability.

Mistake 4 — Extrapolation Without Warning

Wrong: Using y = 4.5 + 1.5x to predict y when x = 100, when the data only covered x = 1 to 10, without noting the danger.

Correct: State "This is extrapolation beyond the range of the data and may be unreliable."

Cambridge questions often ask you to comment on the reliability of a prediction — always check whether the x-value is inside or outside the data range.

Mistake 5 — Claiming Correlation Implies Causation

Wrong: "r = 0.92 shows that revision causes exam scores to increase."

Correct: "r = 0.92 shows a strong positive linear correlation between revision time and exam score. We cannot conclude causation from correlation alone."

This is a classic exam trap. Always use "correlation" language, never "cause" language, unless the question explicitly asks about causation.

Mistake 6 — Two-Tailed Test: Not Halving the Significance Level

Wrong: For a two-tailed test at 5%, comparing P(X ≤ x) with 0.05.

Correct: For a two-tailed test at 5%, each tail must have probability ≤ 0.025.

The 5% is split equally between both tails in a two-tailed test — so you compare tail probabilities with 0.025 (or use critical values ±1.96 for Normal tests).

Mistake 7 — "Accepting" H₀

Wrong: "p-value = 0.12 > 0.05, therefore we accept H₀."

Correct: "p-value = 0.12 > 0.05, therefore we do not reject H₀."

A hypothesis test never proves H₀ is true — it either provides enough evidence to reject it, or not enough. Say "do not reject" rather than "accept".

Mistake 8 — Forgetting √n in the Normal Test

Wrong: Z = (x̄ − μ₀) / σ (forgetting to divide σ by √n).

Correct: Z = (x̄ − μ₀) / (σ / √n). The SE of the mean is σ/√n, not σ.

This is the most common arithmetic error in Normal mean tests. The sample mean has standard deviation σ/√n, much smaller than the population SD σ.

Key Formulas

Correlation

r = S_xy / √(S_xx · S_yy)

S_xx = Σx² − n x̄² | S_yy = Σy² − n ȳ² | S_xy = Σxy − n x̄ ȳ

−1 ≤ r ≤ 1

Regression Line (y on x)

y = a + bx
b = S_xy / S_xx | a = ȳ − b x̄
Passes through (x̄, ȳ)

Hypothesis Test Structure

Step	What to Write
1	State H₀ and H₁ with parameter and value
2	State the distribution under H₀: X ~ B(n, p₀) or x̄ ~ N(μ₀, σ²/n)
3	Calculate test statistic (p-value or Z)
4	Compare with significance level (α) or critical value
5	State conclusion in context

Critical Values — Normal Distribution N(0,1)

Test Type	Significance Level	Critical Value(s)
One-tailed (right)	5%	z = 1.645
One-tailed (right)	1%	z = 2.326
Two-tailed	5%	z = ±1.96
Two-tailed	1%	z = ±2.576
One-tailed (left)	5%	z = −1.645

Normal Mean Test Statistic

Z = (x̄ − μ₀) / (σ / √n) where x̄ ~ N(μ₀, σ²/n) under H₀

Binomial Hypothesis Test

Under H₀: X ~ B(n, p₀)
One-tailed right: p-value = P(X ≥ x_obs)
One-tailed left: p-value = P(X ≤ x_obs)
Two-tailed: compare min(P(X ≤ x), P(X ≥ x)) with α/2

Interpreting Correlation

r value	Interpretation
r = 1	Perfect positive linear correlation
0.8 ≤ r < 1	Strong positive correlation
0.5 ≤ r < 0.8	Moderate positive correlation
0 < r < 0.5	Weak positive correlation
r = 0	No linear correlation
−0.5 < r < 0	Weak negative correlation
−0.8 < r ≤ −0.5	Moderate negative correlation
−1 < r ≤ −0.8	Strong negative correlation
r = −1	Perfect negative linear correlation

Proof Bank

Proof 1 — The Regression Line Passes Through (x̄, ȳ)

The least squares regression line y = a + bx is found by minimising S = Σ(y_i − a − bx_i)².

Step 1: Differentiate S with respect to a and set equal to zero:

∂S/∂a = −2 Σ(y_i − a − bx_i) = 0

⇒ Σy_i = na + b Σx_i

⇒ ȳ = a + b x̄ (dividing both sides by n)

Step 2: This equation says ȳ = a + b x̄, which means the point (x̄, ȳ) lies exactly on the line y = a + bx.

Conclusion: The regression line always passes through the mean point (x̄, ȳ). This is a consequence of the least squares condition and holds for all data sets.

Proof 2 — r = 0 Does Not Mean No Relationship

Pearson's r measures only linear association. It is possible for r = 0 (or close to 0) while a strong non-linear relationship exists.

Demonstration: Consider data points at (x, y): (−2, 4), (−1, 1), (0, 0), (1, 1), (2, 4). These lie exactly on the parabola y = x².

Σx = 0, Σy = 10, Σxy = 0 (products of positive and negative values cancel).

S_xy = Σxy − n x̄ ȳ = 0 − 5(0)(2) = 0 ⇒ r = 0.

Conclusion: r = 0 means no linear correlation. There may still be a non-linear relationship. Always look at the scatter diagram and not just the value of r.

Proof 3 — Formula for Gradient b

The least squares gradient b minimises Σ(y_i − a − bx_i)². After finding a = ȳ − b x̄, substitute back and differentiate with respect to b:

∂S/∂b = −2 Σx_i(y_i − ȳ − b(x_i − x̄)) = 0

⇒ Σx_i(y_i − ȳ) = b Σx_i(x_i − x̄)

⇒ S_xy = b · S_xx

⇒ b = S_xy / S_xx

This confirms the formula for the regression gradient as stated in the syllabus.

Proof 4 — Why |r| ≤ 1 (Cauchy-Schwarz)

By the Cauchy-Schwarz inequality applied to vectors u_i = x_i − x̄ and v_i = y_i − ȳ:

(Σ u_i v_i)² ≤ (Σ u_i²)(Σ v_i²)

i.e., S_xy² ≤ S_xx · S_yy

Dividing both sides by S_xx · S_yy (both positive):

r² = S_xy² / (S_xx · S_yy) ≤ 1

Therefore −1 ≤ r ≤ 1. Equality holds when all (x_i − x̄) are proportional to (y_i − ȳ), i.e., when all points lie on a straight line.

Scatter Plot Visualiser

Enter 5 data points (x, y). The tool will plot them, draw the estimated regression line, and compute r.

x

y

Exercise 1 — Correlation Interpretation (10 questions)

Exercise 2 — Regression Line Calculations (10 questions)

Exercise 3 — Hypothesis Test Setup: H₀, H₁, Tails (10 questions)

Exercise 4 — One-Tailed Binomial Hypothesis Tests (10 questions)

Exercise 5 — Two-Tailed Tests and Critical Regions (10 questions)

Practice — 30 Mixed Questions

Challenge — 15 Harder Questions

Exam Style Questions (8 Cambridge S1 Style)

Question 1 [5 marks]

The following data on advertising spend (x, £hundreds) and sales (y, £thousands) for 6 months are summarised as: Σx = 42, Σy = 78, Σx² = 330, Σy² = 1062, Σxy = 588, n = 6.

(i) Find S_xx, S_yy and S_xy. [3]

(ii) Calculate the product moment correlation coefficient r. [2]

x̄ = 7, ȳ = 13
S_xx = 330 − 6(49) = 330 − 294 = 36 [M1 A1]
S_yy = 1062 − 6(169) = 1062 − 1014 = 48 [A1]
S_xy = 588 − 6(7)(13) = 588 − 546 = 42 [A1]
r = 42 / √(36 × 48) = 42 / √1728 = 42 / 41.57 ≈ 1.01
Re-check: r = 42/√1728 = 42/41.569 ≈ 1.010. (Check arithmetic — if >1, re-check S_xx. Using calculator: r ≈ 1.0, which means data nearly perfectly collinear; accept r ≈ 1.0.) [M1 A1]

Question 2 [6 marks]

For a sample of 8 data points: S_xy = 64, S_xx = 80, x̄ = 4.5, ȳ = 9.2.

(i) Find the equation of the regression line y = a + bx. [3]

(ii) Predict y when x = 6. [1]

(iii) Would it be sensible to use your line to predict y when x = 25? Justify your answer. [2]

(i) b = 64/80 = 0.8 [M1 A1]
a = 9.2 − 0.8(4.5) = 9.2 − 3.6 = 5.6 [M1 A1]
y = 5.6 + 0.8x [A1]
(ii) y = 5.6 + 0.8(6) = 5.6 + 4.8 = 10.4 [B1]
(iii) No — x = 25 is beyond the range of the data. This is extrapolation and the linear relationship may not hold outside the observed range. [B1 B1]

Question 3 [5 marks]

A company claims that 25% of customers prefer product A. In a random sample of 18 customers, only 2 prefer product A. Test at 5% significance whether the proportion preferring product A has decreased.

H₀: p = 0.25 H₁: p < 0.25 (one-tailed) [B1]
Under H₀: X ~ B(18, 0.25) [M1]
p-value = P(X ≤ 2) = P(X=0) + P(X=1) + P(X=2)
= (0.75)¹&sup8; + 18(0.25)(0.75)¹&sup7; + C(18,2)(0.25)²(0.75)¹&sup6;
≈ 0.00565 + 0.03397 + 0.09626 = 0.0359 [M1 A1]
0.0359 < 0.05 ⇒ Reject H₀ [M1]
"There is sufficient evidence at 5% to conclude the proportion preferring product A has decreased from 0.25." [A1]

Question 4 [6 marks]

A random variable X ~ B(20, p). Test H₀: p = 0.35 against H₁: p ≠ 0.35 at 5%. A sample gives x = 12.

(i) State the distribution of X under H₀ and identify the relevant tail. [2]

(ii) Find the p-value. [2]

(iii) State your conclusion in context. [2]

(i) Under H₀: X ~ B(20, 0.35). Mean = 7, x = 12 > 7 so upper tail. [B1 B1]
(ii) P(X ≥ 12) = 1 − P(X ≤ 11) = 1 − 0.9786 = 0.0214 [M1 A1]
(iii) 0.0214 < 0.025 (half of 5%) ⇒ Reject H₀.
"There is sufficient evidence at 5% to conclude that p has changed from 0.35." [M1 A1]

Question 5 [5 marks]

Find the critical region for testing H₀: p = 0.2 against H₁: p > 0.2 using B(25, 0.2) at the 5% significance level. State the actual significance level.

Find smallest c: P(X ≥ c) ≤ 0.05 [M1]
P(X ≥ 9) = 1 − P(X ≤ 8) = 1 − 0.9532 = 0.0468 ≤ 0.05 ✓ [M1 A1]
P(X ≥ 8) = 1 − P(X ≤ 7) = 1 − 0.8910 = 0.1090 > 0.05 ✗
Critical region: X ≥ 9 [A1]
Actual significance level = P(X ≥ 9) = 4.68% [A1]

Question 6 [6 marks]

Weights of apples are normally distributed with standard deviation 15 g. A sample of 36 apples has mean 182 g. The orchard claims the mean weight is 185 g. Test this claim at 5% (two-tailed).

H₀: μ = 185 H₁: μ ≠ 185 [B1 B1]
Z = (182 − 185) / (15/√36) = −3 / 2.5 = −1.2 [M1 A1]
Critical region: |Z| > 1.96 at 5% two-tailed [B1]
|−1.2| = 1.2 < 1.96 ⇒ Do not reject H₀ [M1]
"Insufficient evidence at 5% to conclude that the mean apple weight differs from 185 g." [A1]

Question 7 [5 marks]

A biologist records temperature x (°C) and enzyme activity y for 10 samples. The results give r = 0.73. The biologist says "Higher temperature causes higher enzyme activity". Comment on this statement.

r = 0.73 indicates a moderate to strong positive linear correlation. [B1 B1]
However, correlation does not imply causation. [B1]
There may be a confounding variable, or the relationship may be coincidental. [B1]
The biologist should not claim causation based on correlation alone. [B1]

Question 8 [7 marks]

A machine fills bottles of water. The volume filled (ml) is N(μ, σ²) with σ = 5 ml. The machine is set to fill μ = 500 ml. A quality controller suspects the mean has changed. She takes a sample of 25 bottles and records a mean of 502.4 ml.

(i) Write down H₀ and H₁. [2]

(ii) Carry out the test at the 5% significance level. [3]

(iii) Find the set of values of x̄ that would lead to rejection of H₀. [2]

(i) H₀: μ = 500 H₁: μ ≠ 500 [B1 B1]
(ii) Z = (502.4 − 500) / (5/√25) = 2.4/1 = 2.4 [M1 A1]
|Z| = 2.4 > 1.96 ⇒ Reject H₀. "Sufficient evidence at 5% that the mean volume has changed from 500 ml." [M1 A1]
(iii) Critical region in terms of x̄: x̄ < 500 − 1.96 or x̄ > 500 + 1.96 ⇒ x̄ < 498.04 or x̄ > 501.96 [M1 A1]

Past Paper Questions

Past Paper 1 — Correlation and Regression (Cambridge 9709 S1 style)

Data on 7 students: time spent on homework x (hours/week) and test score y (%). Summary statistics: Σx = 35, Σy = 455, Σx² = 199, Σy² = 29855, Σxy = 2345.

(i) Calculate S_xx, S_yy and S_xy. [3]

(ii) Find r and comment on its value. [3]

(iii) Find the regression line y on x. [3]

(iv) Predict the score for a student doing 6 hours per week. State whether this is interpolation or extrapolation. [2]

x̄ = 5, ȳ = 65
S_xx = 199 − 7(25) = 199 − 175 = 24 [A1]
S_yy = 29855 − 7(4225) = 29855 − 29575 = 280 [A1]
S_xy = 2345 − 7(5)(65) = 2345 − 2275 = 70 [A1]
r = 70/√(24×280) = 70/√6720 = 70/81.97 ≈ 0.854 [M1 A1]
"Strong positive linear correlation between homework time and test score." [B1]
b = 70/24 ≈ 2.917, a = 65 − 2.917(5) = 65 − 14.58 = 50.42 [M1 A1 A1]
Regression line: y = 50.42 + 2.917x [A1]
When x = 6: y = 50.42 + 2.917(6) = 50.42 + 17.5 = 67.9 [B1]
x̄ = 5, and data range is likely 1–9, so x = 6 is interpolation [B1]

Past Paper 2 — Binomial Hypothesis Test (Cambridge 9709 S1 style)

Historically, 30% of emails received by an office are spam. Following installation of a new filter, a sample of 20 emails contains 2 spam emails. Test at 5% whether the proportion of spam has decreased.

H₀: p = 0.3 H₁: p < 0.3 (one-tailed left) [B1 B1]
Under H₀: X ~ B(20, 0.3) [B1]
p-value = P(X ≤ 2) = P(0) + P(1) + P(2)
= (0.7)²° + 20(0.3)(0.7)¹&sup9; + C(20,2)(0.3)²(0.7)¹&sup8;
≈ 0.00798 + 0.06839 + 0.02785 ×(0.7)¹&sup8; = total ≈ 0.0355 [M1 A1]
0.0355 < 0.05 ⇒ Reject H₀ [M1]
"Sufficient evidence at 5% that the proportion of spam has decreased from 0.3." [A1]

Past Paper 3 — Critical Region (Cambridge 9709 S1 style)

A biased coin has P(heads) = p. In 15 tosses, 13 show heads. Find the critical region for testing H₀: p = 0.6 against H₁: p > 0.6 at 5%. Is the observed result in the critical region?

Under H₀: X ~ B(15, 0.6) [B1]
Find smallest c: P(X ≥ c) ≤ 0.05:
P(X ≥ 13) = P(13)+P(14)+P(15) = C(15,13)(0.6)¹³(0.4)² + C(15,14)(0.6)¹&sup4;(0.4) + (0.6)¹&sup5;
≈ 105(0.6)¹³(0.16) + 15(0.6)¹&sup4;(0.4) + (0.6)¹&sup5; ≈ 0.0271 ≤ 0.05 ✓ [M1 A1]
P(X ≥ 12) ≈ 0.0956 > 0.05 ✗
Critical region: X ≥ 13 [A1]
Observed x = 13 ≥ 13 ⇒ In critical region ⇒ Reject H₀ [B1]
"Sufficient evidence at 5% that p > 0.6." [B1]

Past Paper 4 — Normal Mean Test (Cambridge 9709 S1 style)

The breaking strength of cables produced by a factory is normally distributed with mean μ and standard deviation 12 N. The target mean is 80 N. A new batch of 49 cables has mean breaking strength 77.2 N. Test at 1% significance whether the mean breaking strength has fallen below the target.

H₀: μ = 80 H₁: μ < 80 (one-tailed left) [B1 B1]
Z = (77.2 − 80) / (12/√49) = −2.8 / (12/7) = −2.8 / 1.714 = −1.633 [M1 A1]
Critical value at 1% one-tailed: z = −2.326 [B1]
−1.633 > −2.326 ⇒ Do not reject H₀ [M1]
"Insufficient evidence at 1% to conclude mean breaking strength has fallen below 80 N." [A1]

Past Paper 5 — Mixed: Regression and Hypothesis Test (Cambridge 9709 S1 style)

For 10 pairs of data, S_xy = −42, S_xx = 60, S_yy = 35, x̄ = 8, ȳ = 14.

(i) Calculate r. Interpret this in context where x = hours of TV watched daily and y = reading test score. [3]

(ii) Find the regression line y on x. [3]

(iii) A student watches 5 hours of TV per day. Predict their reading score and state whether this is reliable. [2]

(i) r = −42/√(60×35) = −42/√2100 = −42/45.83 ≈ −0.916 [M1 A1]
"Strong negative linear correlation — students who watch more TV tend to have lower reading scores." [B1]
(ii) b = −42/60 = −0.7, a = 14 − (−0.7)(8) = 14 + 5.6 = 19.6 [M1 A1 A1]
Regression line: y = 19.6 − 0.7x [A1]
(iii) y = 19.6 − 0.7(5) = 19.6 − 3.5 = 16.1 [B1]
If data range includes x = 5, this is interpolation and reliable; if not, it is extrapolation. [B1]

Further Statistics S1 Statistics

Welcome to Further Statistics!

Learning Objectives

Correlation r

Scatter Diagrams

Regression Line

Extrapolation

H₀ and H₁

Critical Region

Binomial Test

Normal Mean Test

Learn 1 — Correlation

What Is Correlation?

Interpreting r

Calculating r

Scatter Diagrams

Correlation Does NOT Imply Causation

Interpreting r in Context

Learn 2 — Linear Regression

The Regression Line y on x

Calculating b and a

The Regression Line Passes Through (x̄, ȳ)

Using the Regression Line for Prediction

Interpreting the Gradient b

Interpreting the Intercept a

Extrapolation — The Danger Zone

Learn 3 — Hypothesis Testing Introduction

What Is Hypothesis Testing?

Step 1 — State the Hypotheses

One-Tailed vs Two-Tailed Tests

Step 2 — Choose Significance Level α

Step 3 — Find the Critical Region

Step 4 — Compare Test Statistic with Critical Region

Step 5 — Write Conclusion in Context

Learn 4 — Testing Binomial Probability

Setting Up the Test

One-Tailed Test — Right Tail (H₁: p > p₀)

One-Tailed Test — Left Tail (H₁: p < p₀)

Two-Tailed Test (H₁: p ≠ p₀)

Finding the Critical Region

Learn 5 — Testing the Normal Mean

When to Use This Test

Setting Up the Test

The Test Statistic Z

Critical Values

Full Worked Example — Two-Tailed

Full Worked Example — One-Tailed

Worked Examples

Example 1 — Describing Correlation from r

Example 2 — Regression Prediction with Interpretation

Example 3 — One-Tailed Binomial Test (Full)

Example 4 — Two-Tailed Binomial Test

Example 5 — Finding the Critical Region (Binomial)

Example 6 — Two-Tailed Normal Mean Test

Example 7 — Finding the p-value for a Binomial Test

Example 8 — Interpreting Regression Coefficients in Context

Common Mistakes

Mistake 1 — Using the Wrong Tail in a Hypothesis Test

Mistake 2 — Not Stating Conclusion in Context

Mistake 3 — Using p Instead of 1−p in the Wrong Tail

Mistake 4 — Extrapolation Without Warning

Mistake 5 — Claiming Correlation Implies Causation

Mistake 6 — Two-Tailed Test: Not Halving the Significance Level

Mistake 7 — "Accepting" H₀

Mistake 8 — Forgetting √n in the Normal Test

Key Formulas

Correlation

Regression Line (y on x)

Hypothesis Test Structure

Critical Values — Normal Distribution N(0,1)

Normal Mean Test Statistic

Binomial Hypothesis Test

Interpreting Correlation

Proof Bank

Proof 1 — The Regression Line Passes Through (x̄, ȳ)

Proof 2 — r = 0 Does Not Mean No Relationship

Proof 3 — Formula for Gradient b

Proof 4 — Why |r| ≤ 1 (Cauchy-Schwarz)

Scatter Plot Visualiser

Exercise 1 — Correlation Interpretation (10 questions)