Grade 11 · Statistics 1 · Cambridge A-Level 9709 · Age 16–17
This topic covers the final and most powerful ideas in Cambridge A-Level Statistics 1 (9709): measuring the strength and direction of relationships between variables using correlation, finding the line of best fit using linear regression, and drawing evidence-based conclusions using hypothesis testing. These tools are used throughout science, economics, medicine, and data analysis.
Measures linear association: r = ±1 perfect; r = 0 no linear correlation
Visual display of bivariate data — look for pattern, direction, and outliers
y = a + bx (least squares); passes through (x̄, ȳ)
Using the line outside the data range — unreliable and not recommended
Null hypothesis (assumed true) vs alternative (what we want to test)
Values of the test statistic that lead to rejecting H0
Test p using B(n, p0) — compare tail probability with α
Standardise x̄ using Z = (x̄ − μ0) / (σ/√n); compare with zα
Correlation measures the strength and direction of a linear relationship between two variables x and y. It is quantified by Pearson's product moment correlation coefficient r, which always lies between −1 and 1 (inclusive).
The formula uses three summary statistics: Sxx, Syy, and Sxy.
Before calculating r, always draw (or inspect) a scatter diagram — a graph where each data point (x, y) is plotted as a dot. Look for:
When there is a linear relationship between x and y, we can find the least squares regression line y = a + bx. This line minimises the sum of squared vertical distances from each point to the line — giving the "best fit" straight line.
The least squares regression line always passes through the mean point (x̄, ȳ). This is because a = ȳ − b x̄, so substituting x = x̄ gives y = a + b x̄ = ȳ. Use this as a check.
A hypothesis test uses sample data to decide whether there is sufficient evidence to reject an assumption about a population. It provides a formal framework for drawing conclusions under uncertainty.
The significance level α is the probability of incorrectly rejecting H0 when it is actually true (a Type I error). Common values: α = 0.05 (5%) or α = 0.01 (1%).
Calculate the probability of observing a result as extreme as (or more extreme than) the data, assuming H0 is true. This is the p-value.
When we observe X successes in n trials and want to test a claimed probability p0, we use the model X ~ B(n, p0) under H0.
When a population is normally distributed with known variance σ², and we take a sample of size n, the sample mean x̄ has distribution:
We can test whether the true population mean μ equals a specific value μ0.
Under H0, standardise the sample mean:
Under H0, Z ~ N(0,1). Compare Z with critical values from the standard Normal table.
| Test type | Level α | Critical value | Reject H0 if |
|---|---|---|---|
| One-tailed (right) | 5% | 1.645 | Z > 1.645 |
| One-tailed (left) | 5% | −1.645 | Z < −1.645 |
| Two-tailed | 5% | ±1.96 | |Z| > 1.96 |
| One-tailed (right) | 1% | 2.326 | Z > 2.326 |
| Two-tailed | 1% | ±2.576 | |Z| > 2.576 |
r = −0.87. Interpret this value in the context of the study: x = temperature (°C), y = hot drink sales per day.
Given Sxy = 126, Sxx = 84, x̄ = 5, ȳ = 12. (i) Find the regression line. (ii) Predict y when x = 7. (iii) Interpret the gradient.
A coin is claimed to be fair. It is tossed 10 times and shows 8 heads. Test at 5% whether the probability of heads has increased.
A thumbtack lands point-up with probability p. In 15 trials, it lands point-up 3 times. Test H0: p = 0.4 against H1: p ≠ 0.4 at 5%.
X ~ B(20, 0.25) under H0. Find the critical region for H1: p < 0.25 at 5%.
Heights of adult males are N(μ, 49). A sample of 36 gives x̄ = 172.8 cm. Test H0: μ = 175 at 5% (two-tailed).
Under H0: X ~ B(12, 0.3). Observed x = 7. H1: p > 0.3. Find the p-value and state the conclusion at 10%.
The regression line for predicting salary (y, £thousands) from years of experience (x) is y = 18.4 + 2.3x. Interpret a and b.
The direction of H1 tells you which tail to use. Right-tail tests use P(X ≥ observed), left-tail tests use P(X ≤ observed).
Cambridge exam mark schemes almost always require the conclusion to reference the specific context — variable name, direction, and original claimed value.
Be careful when p > 0.5 — the distribution is skewed left, so observed values less than the mean must use the lower tail probability.
Cambridge questions often ask you to comment on the reliability of a prediction — always check whether the x-value is inside or outside the data range.
This is a classic exam trap. Always use "correlation" language, never "cause" language, unless the question explicitly asks about causation.
The 5% is split equally between both tails in a two-tailed test — so you compare tail probabilities with 0.025 (or use critical values ±1.96 for Normal tests).
A hypothesis test never proves H0 is true — it either provides enough evidence to reject it, or not enough. Say "do not reject" rather than "accept".
This is the most common arithmetic error in Normal mean tests. The sample mean has standard deviation σ/√n, much smaller than the population SD σ.
| Step | What to Write |
|---|---|
| 1 | State H0 and H1 with parameter and value |
| 2 | State the distribution under H0: X ~ B(n, p0) or x̄ ~ N(μ0, σ²/n) |
| 3 | Calculate test statistic (p-value or Z) |
| 4 | Compare with significance level (α) or critical value |
| 5 | State conclusion in context |
| Test Type | Significance Level | Critical Value(s) |
|---|---|---|
| One-tailed (right) | 5% | z = 1.645 |
| One-tailed (right) | 1% | z = 2.326 |
| Two-tailed | 5% | z = ±1.96 |
| Two-tailed | 1% | z = ±2.576 |
| One-tailed (left) | 5% | z = −1.645 |
| r value | Interpretation |
|---|---|
| r = 1 | Perfect positive linear correlation |
| 0.8 ≤ r < 1 | Strong positive correlation |
| 0.5 ≤ r < 0.8 | Moderate positive correlation |
| 0 < r < 0.5 | Weak positive correlation |
| r = 0 | No linear correlation |
| −0.5 < r < 0 | Weak negative correlation |
| −0.8 < r ≤ −0.5 | Moderate negative correlation |
| −1 < r ≤ −0.8 | Strong negative correlation |
| r = −1 | Perfect negative linear correlation |
The least squares regression line y = a + bx is found by minimising S = Σ(yi − a − bxi)².
Step 1: Differentiate S with respect to a and set equal to zero:
∂S/∂a = −2 Σ(yi − a − bxi) = 0
⇒ Σyi = na + b Σxi
⇒ ȳ = a + b x̄ (dividing both sides by n)
Step 2: This equation says ȳ = a + b x̄, which means the point (x̄, ȳ) lies exactly on the line y = a + bx.
Conclusion: The regression line always passes through the mean point (x̄, ȳ). This is a consequence of the least squares condition and holds for all data sets.
Pearson's r measures only linear association. It is possible for r = 0 (or close to 0) while a strong non-linear relationship exists.
Demonstration: Consider data points at (x, y): (−2, 4), (−1, 1), (0, 0), (1, 1), (2, 4). These lie exactly on the parabola y = x².
Σx = 0, Σy = 10, Σxy = 0 (products of positive and negative values cancel).
Sxy = Σxy − n x̄ ȳ = 0 − 5(0)(2) = 0 ⇒ r = 0.
Conclusion: r = 0 means no linear correlation. There may still be a non-linear relationship. Always look at the scatter diagram and not just the value of r.
The least squares gradient b minimises Σ(yi − a − bxi)². After finding a = ȳ − b x̄, substitute back and differentiate with respect to b:
∂S/∂b = −2 Σxi(yi − ȳ − b(xi − x̄)) = 0
⇒ Σxi(yi − ȳ) = b Σxi(xi − x̄)
⇒ Sxy = b · Sxx
⇒ b = Sxy / Sxx
This confirms the formula for the regression gradient as stated in the syllabus.
By the Cauchy-Schwarz inequality applied to vectors ui = xi − x̄ and vi = yi − ȳ:
(Σ ui vi)² ≤ (Σ ui²)(Σ vi²)
i.e., Sxy² ≤ Sxx · Syy
Dividing both sides by Sxx · Syy (both positive):
r² = Sxy² / (Sxx · Syy) ≤ 1
Therefore −1 ≤ r ≤ 1. Equality holds when all (xi − x̄) are proportional to (yi − ȳ), i.e., when all points lie on a straight line.
Enter 5 data points (x, y). The tool will plot them, draw the estimated regression line, and compute r.
The following data on advertising spend (x, £hundreds) and sales (y, £thousands) for 6 months are summarised as: Σx = 42, Σy = 78, Σx² = 330, Σy² = 1062, Σxy = 588, n = 6.
(i) Find Sxx, Syy and Sxy. [3]
(ii) Calculate the product moment correlation coefficient r. [2]
For a sample of 8 data points: Sxy = 64, Sxx = 80, x̄ = 4.5, ȳ = 9.2.
(i) Find the equation of the regression line y = a + bx. [3]
(ii) Predict y when x = 6. [1]
(iii) Would it be sensible to use your line to predict y when x = 25? Justify your answer. [2]
A company claims that 25% of customers prefer product A. In a random sample of 18 customers, only 2 prefer product A. Test at 5% significance whether the proportion preferring product A has decreased.
A random variable X ~ B(20, p). Test H0: p = 0.35 against H1: p ≠ 0.35 at 5%. A sample gives x = 12.
(i) State the distribution of X under H0 and identify the relevant tail. [2]
(ii) Find the p-value. [2]
(iii) State your conclusion in context. [2]
Find the critical region for testing H0: p = 0.2 against H1: p > 0.2 using B(25, 0.2) at the 5% significance level. State the actual significance level.
Weights of apples are normally distributed with standard deviation 15 g. A sample of 36 apples has mean 182 g. The orchard claims the mean weight is 185 g. Test this claim at 5% (two-tailed).
A biologist records temperature x (°C) and enzyme activity y for 10 samples. The results give r = 0.73. The biologist says "Higher temperature causes higher enzyme activity". Comment on this statement.
A machine fills bottles of water. The volume filled (ml) is N(μ, σ²) with σ = 5 ml. The machine is set to fill μ = 500 ml. A quality controller suspects the mean has changed. She takes a sample of 25 bottles and records a mean of 502.4 ml.
(i) Write down H0 and H1. [2]
(ii) Carry out the test at the 5% significance level. [3]
(iii) Find the set of values of x̄ that would lead to rejection of H0. [2]
Data on 7 students: time spent on homework x (hours/week) and test score y (%). Summary statistics: Σx = 35, Σy = 455, Σx² = 199, Σy² = 29855, Σxy = 2345.
(i) Calculate Sxx, Syy and Sxy. [3]
(ii) Find r and comment on its value. [3]
(iii) Find the regression line y on x. [3]
(iv) Predict the score for a student doing 6 hours per week. State whether this is interpolation or extrapolation. [2]
Historically, 30% of emails received by an office are spam. Following installation of a new filter, a sample of 20 emails contains 2 spam emails. Test at 5% whether the proportion of spam has decreased.
A biased coin has P(heads) = p. In 15 tosses, 13 show heads. Find the critical region for testing H0: p = 0.6 against H1: p > 0.6 at 5%. Is the observed result in the critical region?
The breaking strength of cables produced by a factory is normally distributed with mean μ and standard deviation 12 N. The target mean is 80 N. A new batch of 49 cables has mean breaking strength 77.2 N. Test at 1% significance whether the mean breaking strength has fallen below the target.
For 10 pairs of data, Sxy = −42, Sxx = 60, Syy = 35, x̄ = 8, ȳ = 14.
(i) Calculate r. Interpret this in context where x = hours of TV watched daily and y = reading test score. [3]
(ii) Find the regression line y on x. [3]
(iii) A student watches 5 hours of TV per day. Predict their reading score and state whether this is reliable. [2]