IB Mathematics - Questionbank

4.2 Correlation & Regression

Question 1

The following table shows the data collected from an experiment.

The data is also represented on the following scatter diagram.

The relationship between `x` and `y` can be modelled by the regression line of `y` on `x` with equation `y=ax+b`, where `a,b in RR`.

(a) Write down the value of `a` and the value of `b`.

(b) Use this model to predict the value of `y` when `x=18`.

(c) Write down the value of `barx` and `bary`.

(d) Draw the line of best fit on the scatter diagram.

Mark as Complete

Mark Scheme

Question 2

At a café, the waiting time between ordering and receiving a cup of coffee is dependent upon the number of customers who have already ordered their coffee and are waiting to receive it.

Sarah, a regular customer, visited the café on five consecutive days. The following table shows the number of customers, `x`, ahead of Sarah who have already ordered and are waiting to receive their coffee and Sarah’s waiting time, `y` minutes.

The relationship between `x` and `y` can be modelled by the regression line of `y` on `x` with equation `y=ax+b`.

(a) 

(i) Find the value of `a` and the value of `b`.

(ii) Write down the value of Pearson’s product-moment correlation coefficient, `r`.

(b) Interpret, in context, the value of `a` found in part (a)(i).

On another day, Sarah visits the café to order a coffee. Seven customers have already ordered their coffee and are waiting to receive it.

(c) Use the result from part (a)(i) to estimate Sarah’s waiting time to receive her coffee.

Mark as Complete

Mark Scheme

Question 3

In Lucy’s music academy, eight students took their piano diploma examination and achieved scores out of 150. For her records, Lucy decided to record the average number of hours per week each student reported practicing in the weeks prior to their examination. These results are summarized in the table below.

(a) Find Pearson’s product-moment correlation coefficient, `r`, for these data.

(b) The relationship between the variables can be modelled by the regression equation `D = ah + b`. Write down the value of `a` and the value of `b`.

(c) One of these eight students was disappointed with her result and wished she had practiced more. Based on the given data, determine how her score could have been expected to alter had she practiced an extra five hours per week.

Mark as Complete

Mark Scheme

Question 4

The following table shows the Mathematics test scores `(x)` and the Science test scores `(y)` for a group of eight students.

The regression line of `y` on `x` for this data can be written in the form `y=ax+b`.

(a) Find the value of `a` and the value of `b`

(b) Write down the value of the Pearson’s product-moment correlation coefficient, `r`.

(c) Use the equation of your regression line to predict the Science test score for a student who has a score of 78 on the Mathematics test. Express your answer to the nearest integer.

Mark as Complete

Mark Scheme

Question 5

A botanist is conducting an experiment which studies the growth of plants.

The heights of the plants are measured on seven different days.

The following table shows the number of days, `d`, that the experiment has been running and the average height, `h` cm, of the plants on each of those days.

(a) The regression line of `h` on `d` for this data can be written in the form `h = ad + b` 

Find the value of `a` and the value of `b`.

(b) Write down the value of the Pearson’s product-moment correlation coefficient, `r`.

(c) Use your regression line to estimate the average height of the plants when the experiment has been running for 20 days.

Mark as Complete

Mark Scheme

Question 6

A class is given two tests, Test A and Test B. Each test is scored out of a total of 100 marks. The scores of the students are shown in the following table.

Let `x` be the score on Test A and `y` be the score on Test B.

The teacher finds that the equation of the regression line of `y` on `x` for these scores is `y = 0.822x + 18.4`.

(a) Find the value of the Pearson’s product-moment correlation coefficient, `r`

 

Giovanni was absent for Test A and Paulo was absent for Test B.

The teacher uses the regression line of `y` on `x` to estimate the missing scores.

Paulo scored 10 on Test A.

The teacher estimated his score on Test B to be 27 to the nearest integer using the following calculation:

`y = 0.822(10) + 18.4 ~~ 27`

(b) Give a reason why this method is not appropriate for Paulo. 

 

Giovanni scored 90 on Test B.

The teacher estimated his score on Test A to be 87 to the nearest integer using the following calculation:

`90 = 0.822x + 18.4`, so `x = frac{90-18.4}{0.822}~~87`

(c)

(i) Give a reason why this method is not appropriate for Giovanni.

(ii) Use an appropriate method to show that the estimated Test A score for Giovanni is 86 to the nearest integer.

Mark as Complete

Mark Scheme

Question 7

Consider the following bivariate data set where `p,q in ZZ^+`.

The regression line of `y` on `x` has equation `y = 2.1875x + 0.6875`.

The regression line passes through the mean point `(bar x, bar y)`.

(a) Given that `bar x = 7`, verify that `bar y = 16`.

(b) Given that `q - p = 3`, find the value of `p` and the value of `q`.

Mark as Complete

Mark Scheme

Question 8

Tania wishes to see whether there is any correlation between a person’s age and the number of objects on a tray which could be remembered after looking at them for a certain time.

She obtains the following table of results.

(a) Use your graphic display calculator to find the equation of the regression line of `y` on `x`.

(b) Use your equation to estimate the number of objects remembered by a person aged 28 years.

(c) Use your graphic display calculator to find the correlation coefficient `r`.

(d) Comment on your value for `r`.

Mark as Complete

Mark Scheme

Question 9

(i) The following values are the product-moment correlation coefficients for the five scatter diagrams below.

0.50, –0.95, –0.60, 0.00, 0.90

Match each scatter diagram with its corresponding correlation coefficient.

(ii) The annual advertising expenditures and sales, in dollars, for a small company are listed below. 

(a) Find the regression line `y = a + bx` for this data, and hence predict the annual sales that would result if 7000 dollars were spent in advertising. Give your answer to the nearest thousand dollars. 

(b) If the annual advertising expenditures and annual sales figures above were converted into Japanese yen (1 dollar = 69.4017 yen) which of the following quantities would change and which would remain the same? 

  1. the mean `bar x` of the advertising expenditures;
  2. the standard deviation `s_x` of the advertising expenditures;
  3. the correlation coefficient `r`;
  4. the gradient of the regression line `y = ax+b`.

 

(iii) Intelligence Quotient (IQ) in a certain population is normally distributed with a mean of 100 and a standard deviation of 15. 

(a) What percentage of the population has an IQ between 90 and 125? 

(b) If two persons are chosen at random from the population, what is the probability that both have an IQ greater than 125?

(c) The mean IQ of a random group of 25 persons suffering from a certain brain disorder was found to be 95.2. Is this sufficient evidence, at the 0.05 level of significance, that people suffering from the disorder have, on average, a lower IQ than the entire population? State your null hypothesis and your alternative hypothesis, and explain your reasoning.

 

(iv) The manufacturer of ‘Fizz’, a popular soft drink, offered Millennium University one million dollars per year to make ‘Fizz’ the only soft drink sold on campus. A group of 140 students and teachers were consulted on whether the university should accept or reject the offer. Their responses are summarized in the following table:

(a) Construct a table of expected frequencies assuming that the attitude towards the offer is independent of whether the person is a student or a teacher. Do not apply Yates’ continuity correction.

(b) Calculate `\chi^2` for this set of data.

(c) Based on the `chi^2` test at 0.01 level of significance, which of the following conclusions may be drawn?

  1. Students are more favourable than teachers towards accepting the offer.

  2. Students are less favourable than teachers towards accepting the offer.

  3. Acceptance or rejection does not depend on whether the person is a student or a teacher.

Explain briefly the reasons for your answers.

Mark as Complete

Mark Scheme

Question 10

(i) The mass of packets of a breakfast cereal is normally distributed with a mean of 750g and standard deviation of 25 g.

(a) Find the probability that a packet chosen at random has mass

  1. less than 740g;
  2. at least 780g;
  3. between 740g and 780g.

(b) Two packets are chosen at random. What is the probability that both packets have a mass which is less than 740 g? 

(c) The mass of 70% of the packets is more than 𝑥 grams. Find the value of `x`

 

(ii) Three schools from the same city enter students for an examination in which successful candidates can achieve one of three grades: Pass, Credit, Distinction. The results are shown in the following table

It may be assumed that a student's result is independent of the school attended.

(a) The following table gives the expected frequencies for the above data. 

  1. Calculate the values `a,b,c,d`.
  2. Find `chi^2` for this data.

(b) Newspapers wish to use these results to make comparisons between the schools. Based on the value of `chi^2`, decide whether there is justification for the statement 'success in the examination depends on which school is attended'. Examine the statement

  1. at the 5% level of significance;
  2. at the 10% level of significance.

 

(iii) A scientist is investigating the way in which the length of a metal rod varies with temperature. She reads the length 𝑦 mm at different temperatures 𝑥°C. From a set of these readings, she calculates the following results.
`bar x = 200, bar y = 1000, s_x = 2.31, s_y = 11.7, s_{xy} = 26.1`

(a) Find

  1. the product-moment correlation coefficient `r`;
  2. the equation of the regression line of `y` on `x`;
  3. the length of the rod when the temperature is 170°C.

(b) Which of the following diagrams most closely resembles the set of readings taken by the scientist? Give a reason for your answer. 

Mark as Complete

Mark Scheme

More IB Mathematics