Part I - Problem Recognition Multiple Choice (36 marks ...

Report 5 Downloads 22 Views
Part I - Problem Recognition (24 marks, 3 marks each) – 2 for hypotheses, 1 for test ALEX A ski company in Whistler owns two ski shops, one near Whistler and one near Blackcomb. The following data were collected from both stores:

1.

Mean sales Sample std. Dev. Sample size

Whistler shop $328

Blackcomb shop $435

$104

$151

35 days

30 days

The company would like to test for a difference in daily average goggle sales between the two stores

H0: µw − µB = 0

HA: µw − µB ≠ 0

Test: t test Not equal variance

H0: µ < 37,000

2.

The state lottery office claims that the average household income of those people playing the lottery is greater than $37,000. They also know that the distribution of these households’ income is normal with a standard deviation of $5,756. To test their claim a sample of 25 households was studied. It was found that the average income in the sample was $36,243.

HA: µ > 37,000

Test: Z test

H0: pF − pUS < 0

3.

4.

In random samples of 1000 people in the United States and in France, 70% of the people in the Unites States and 75% of the people in France indicated that they were positive about the future economy. Does this provide strong evidence that the people in France are more optimistic about the economy?

The distributor of the post, a regional newspaper serving North York is considering three types of dispensing racks. Management wants to know if the different racks affect sales. These racks are designated as J-1000, D, and UV-57. Management also wants to know if the placement of the racks either inside or outside supermarkets affects sales. Each of six similar stores was randomly assigned a machine and location combination, and data were collected on the number of papers sold over four days.

HA: pF − pUS > 0

Test: Z test

Ho: no interaction Ha: interaction Ho: µj-1000 = µD = µUV57 Ha: at least two means differ Ho: µinside = µoutside Ha: at least two means differ Two-way ANOVA

ERINDA

5.

A pasta chef was experiencing difficulty in getting brands of pasta to be cooked just right. The main problem she experiences is with the speed of water absorption by the different pasta brands. Pasta with a faster rate of water absorption has a greater tendency to be overcooked. She decides to conduct an experiment in which two brands of pasta, one Canadian and one Italian, were cooked for either 4 or 8 minutes. The variable measured was the speed of water absorption in each case. The results were then recorded an analyzed.

Ho: no interaction Ha: interaction Ho: µc = µI Ha: at least two means differ Ho: µ4 = µ8 Ha: at least two means differ Two-way ANOVA H0: σ2 = 0.0225

6.

7.

A large milling machine produces steel rods to certain specifications. The machine is considered to be running normally if the standard deviation of the diameter of the rods is 0.15 millimeters. As line supervisor, you need to test to see whether the machine is operating normally. You take a sample of 25 rods and find that the sample standard deviation is 0.19.

Are medical students more motivated than law students? A randomly selected group of each were administered a survey of attitudes toward life, which measures motivation for upward mobility. The scores are summarized below (higher scores mean greater motivation). Medical Students 250 83.5 11.2

Sample Size Mean Score Pop. Std. Dev.

Law Students 100 80.2 9.2

In a recent survey, college students were asked the amount of time (in hours) they spend weekly watching television and surfing on the Internet. The researchers were interested in determining whether the time spent on both activities was equal. They collected the following data:

8.

Person #

1

2

3

4

5

6

7

8

Internet

2

7

3

8

9

15

7

2

TV

4

15

5

3

4

4

4

8

HA: σ2 ≠ 0.0225

Test: χ2

H0: µM − µL < 0

HA: µM − µL > 0

Test: z test

H0: µd = 0

HA: µd ≠ 0

Test: t test

Part II – True/False (15 marks, 1 mark each) VANESSA True

False

1.

In a simple regression model, if the regression model is deemed to be statistically significant, it means that the regression slope coefficient is significantly greater than zero

2.

In a hypothesis test, the p-value measures the probability that the alternative hypothesis is true.

X

3.

If a hypothesis test is conducted for a population mean where only non-negative values can be sampled, a null and alternative hypothesis of the form: H0 : μ = 100, Ha : μ ≠ 100, will result in a one-tailed hypothesis test since the statistic can only assume nonnegative values

X

4.

Two variables have a correlation coefficient that is very close to zero. This means that there is no relationship between the two variables.

X

5.

All other things held constant, increasing the level of confidence for a confidence interval estimate for the difference between two population means will result in a wider confidence interval estimate.

X

6.

The method used in regression analysis for incorporating a categorical variable (no. of categories = 5) into the model is by organizing the categorical variable into five dummy variables.

X

7.

In a recent one-way ANOVA test, Mean SSW was equal to 1,590 and the Mean SSB was equal to 310. Therefore, SST is equal to 1,900.

X

8.

A local medical center has advertised that the mean wait for services will be less than 15 minutes (but more than 0 minutes). Given this claim, the hypothesis test for the population mean should be a one-tailed test with the rejection region in the lower (lefthand) tail of the sampling distribution.

X

9.

Consider the following regression equation: ŷ = 356 + 18.0x1 – 2.5x2. The x1 variable is a quantitative variable and the x2 variable is a dummy with values 1 and 0. Given this, we can interpret the slope coefficient on variable x2 as follows: holding x1 constant, if the value of x2 is changed from 1 to 0, the average value of y will increase by 2.5 units.

X

10.

The coefficient of determination measures the percentage of variation in the independent variable that is explained by the dependent variables in the model.

X

11.

A perfect correlation between two variables will always produce a correlation coefficient of +1.0.

X

12.

The prediction interval developed from a simple linear regression model will be at its narrowest point when the value of x used to predict y is equal to the mean value of x.

13.

When testing a hypothesis about the variability of a population, the statistical requirements call for us to convert the variance to standard deviation and run a chisquare test

X

14.

When the expected cell frequencies are smaller than 30, the cells should be combined in a meaningful way such that the expected cell frequencies do exceed 30.

X

15.

If it is known that a simple linear regression model explains 56 percent of the variation in the dependent variable and that the slope on the regression equation is negative, then we also know that the correlation between x and y is approximately (0.75)

16.

In estimating the difference between two population means, if a 95 percent confidence interval includes zero, than we can conclude that there is a 95 percent chance that the difference between the two population means is zero.

17.

In a multiple regression analysis, even if only some of the independent variables have values equal to zero, the regression intercept, b0, can still be meaningful.

X

X

X

X

X

Part III – Computer Output Interpretation (16 Marks, 2 mark each) FADY Random samples of two freshman, two sophomores, two juniors, and two seniors each from four dormitories were asked to rate on a scale from 1 (poor) to 10 (excellent) the quality of the dormitory environment for studying. The results are shown in the table.

Year Freshman Sophomore Junior Senior

D1 7 6 5 7

5 8 4 4

Dormitory D2 D3 8 6 9 8 5 5 7 8 7 6 6 7 6 8 7 5

D4 9 8 7 6

9 9 8 7

Given the following ANOVA table, answer questions 16 through 21: SUMMARY Freshman Count Sum Average Variance

A

B

C

D

Total

2

2

2 A B C

2

8

ANOVA Source of Variation Sample Columns Interaction Within

SS 10.59375 20.34375 16.03125 18.5

Df 3 3 9 16

MS 3.53125 6.78125 1.78125 1.15625

F 3.054054 5.864865 1.540541

P-value 0.058694 0.006706 0.215963

Total

65.46875

31 Answers:

16. What is the value of A?

17

17. What is the value of B?

8.5

18. What is the value of C?

0.5

19. Can we conclude that the effect of the four dormitories is uniform across all students’ groups? a. Yes, and therefore we have test separately for the individual effect of dormitories and student groups. b. Yes, and therefore we can go ahead and interpret the individual effect of dormitories and student groups from the above table. c. No, and therefore we have test separately for the individual effect of dormitories and student groups. d. No, and therefore we can go ahead and interpret the individual effect of dormitories and student groups from the above table. 20. What is the smallest alpha for which you can reject the null hypothesis for differences between the four student groups? a. .05 b. .025 c. .01 d. I am unable to reject the null hypothesis for all these values. 21. What is the smallest alpha for which you can reject the null hypothesis for differences in the four dormitories? a. .05 b. .025 c. .01 d. I am unable to reject the null hypothesis for all these values.

Part IV - Short Answer Questions (35 Marks, Individually Weighted) Question 1 (6 marks) HILA Traditionally, a professor likes to assign grades according to the following breakdown: 15% A’s, 25% B’s, 40% C’s, 15% D’s, and 5% F’s. This year, she gave out 17 A’s, 35 B’s, 60 C’s, 10 D’s and 2 F’s. Is there any evidence that the professor has changed her grading scheme? (use alpha=0.05) Solution: Hypotheses (2 marks): H 0 : P1 = 0.15, P2 = 0.25, P3 = 0.40, P4 = 0.15, and P5 = 0.05 . (also acceptable: the distribution is…) H1 : At least one Pi is not equal to its specified value. (the distribution is not…) Test (3 marks): Grade A B C D F Total

Observed Frequency

Expected Frequency

17 35 60 10 2 124

18.6 31 49.6 18.6 6.2 124

(o-e)2/e 0.1376 0.5161 2.1806 3.9763 2.8452 9.6559

n =124 df = 4 χ20.05,4 = 9.4877 Test statistic: 9.6559 9.6559 > 9.4877 and therefore reject H 0 Interpretation (1 mark) At α = 0.05 and conclude that there is enough statistical evidence that the professor has changed her grading scheme.

The following refers to questions 2 and 3: A professor of business statistics teaching a large lecture wanted to study scores on the three exams that are given during the semester. The exams each cover one portion of the semester and are not cumulative. The results for a sample of 33 students were as follows:

Student 1 2 3 4 5 6 7 8 9 10 11

I 89 80 86 68 88 89 82 89 42 61 84

Exam II 80 68 76 77 95 66 83 86 58 54 84

III 74 74 83 71 85 65 88 54 52 62 51

Student 12 13 14 15 16 17 18 19 20 21 22

I 56 67 99 82 75 58 56 55 72 73 79

Exam II 71 55 95 45 71 44 50 14 59 80 68

III 68 48 77 73 64 52 14 25 75 70 84

The professor has also calculated the following sample statistics: Exam I Mean Standard deviation

Exam II

Exam III

74.7576

67.1818

65.3030

13.7887

20.0007

17.3286

Student 23 24 25 26 27 28 29 30 31 32 33

I 63 89 62 74 62 70 65 82 91 84 95

Exam II 43 80 23 92 57 51 78 53 90 83 88

III 56 77 48 68 55 61 70 58 96 67 90

Question 2 (2*6 marks = 12 marks) OLGA a.

At the 0.05 level of significance, is there evidence of a difference in the students’ grades on exam II and exam III? b. Students complained that exam II was much more difficult than exam I. The professor decides to test this by looking at the variances within the two exams. A higher variance in grades usually indicates a more difficult exam. At the 0.05 level of significance, are the students correct?

Solution Part (a): test of mean difference Mean difference 1.8788 15.3659

Average Stdev

Hypotheses: 2 marks H0: µd = 0 HA: µd ≠ 0 Test: 3 marks t = 1.8788/[15.3659/sqrt(33)] = 0.7024 t crit (0.025,30) = 2.0426 t crit (0.025,40) = 2.0211 Either way: do not reject the null Interpretation: 1 mark Cannot conclude that the grades on exams II and III are different. Part (b): test of variances Hypotheses: 2 marks H0: σ2II / σ2I < 0 HA: σ2II / σ2I > 0 Test: 3 marks Fstatistic = 400.03/190.1282 = 2.104 Fcrit (0.05,32,32) use either: Fcrit (0.05,30,30) = 1.841 Fcrit (0.05,40,40) = 1.693 Either way  reject the null hypothesis. Interpretation: 1 mark There is evidence to conclude, at the 0.05 level, that exam II has greater variability/ was more difficult/ that the students were correct.

Question 3 (10 marks) DORIT The professor would like to use ANOVA to test for differences in the averages of all three exams. Construct the appropriate ANOVA table and analyze the data accordingly. The following values have already been calculated: SSB = 1653.4141 SS Total = 30147.3535 MSW = 107.03 Correct solution ANOVA Source of Variation Rows Columns Error

SS 21644.0202 1653.414141 6849.919192

Total

30147.35354

df 32 2 64

MS 676.3756313 826.7070707 107.030

F 6.319497 7.72407

F crit 1.623862 3.140438

98

0.5 marks * 12 = 6 marks 1) Blocking is necessary (2 marks): 1 for hypotheses; 1 for conclusion 2) The is a difference between the three exams (2 marks): 1 for hypotheses; 1 for conclusion

Incorrect answer – one way ANOVA ANOVA Source of Variation Between Groups Within Groups

SS 1653.414141 28493.93939

Total

30147.35354

df 2 96

MS 826.7070707 296.8118687

F 2.78529

98

1) There is no significant difference between exams If done correctly give 5 marks: 0.5*8 for the table; 1 for the conclusion (5 marks penalty for the wrong test)

* Note: calculations were traced as possible for partial marks

F crit 3.091191

Question 4 (3*3 marks = 9 marks) SAGGI A company in Maryland has developed a device that can be attached to car engines that they believe will increase the miles per gallon that cars will get. The owners are interested in estimating the difference between mean mpg for cars using the device versus those that are not using the device. The following data represent the mpg for random samples of cars from each population. With Device 22.6

Without Device 26.9

23.4

24.4

28.4

20.8

29.0 29.3

20.8 20.2

20.0

26.0 28.1 25.6

(a) Given this data, what is the critical value if the owners wish to have a 95 percent confidence interval estimate? a. b. c. d.

t = 2.1788 t = 1.7823 z = 1.645 None of the above.

(b) What is the upper limit for a 95 percent confidence interval estimate for the difference in mean mpg? a. b. c. d.

Approximately 3.8 mpg About 5.4 mpg Just under 25.0 None of the above.

(c) Which of the following statements is true? a. Given the sample information, using 95 percent confidence, we can’t conclude that a difference exists in the population mean mpg between vehicles that use the new device versus vehicles that do not use the new device. b. The sample information produces a 95 percent confidence interval that leads us to believe that a difference does exist between the population mean mpg between vehicles that use the new device versus vehicles that do not use the new device. c. The sample sizes used are too small to produce a confidence interval estimate that could have any value in reaching a decision about the two engine devices. d. None of the above.

* Note: If got a wrong value to (a) then we accepted “none of the above” for (b) as a correct answer.

Question 5 (10 Marks) - ALAN The Manager of Material Handling was analyzing the factors that influence the times that it takes to unload trucks at the warehouse loading dock. She has had a multiple regression models run, with time as the dependent variable using the following independent variables: Boxes - Number of Boxes Weight - Total weight of the boxes (in hundreds of kilograms) Experience - Years of experience of the person unloading the truck The standard deviation of the unloading times is 15.7665 minutes. The partial Excel™ output is:

Regression Statistics Multiple R 0.90547204 R Square 0.81987962 Adjusted R Square 0.80813264 Standard Error 6.90615778 Observations 50 ANOVA SS

MS

3

df

9,986.61

3,328.87

Residual

46

2,193.97

47.695

Total

49

12,180.58

Regression

Intercept Boxes Weight Experience

F

Significance F

69.795

3.77654E-17

Coefficients Standard Error t Stat P-value -29.3899869 6.749432996 -4.35444 7.38E-05 0.59672985 0.05454831 10.93947 2.17E-14 0.35986252 0.083109499 4.329981 7.99E-05 0.25659422 0.142467554 1.801071 0.078249

Required: a.

Fill in the missing values in the ANOVA table above. (4 marks) … 1 mark per column. No partial beyond that.

b.

Write out the equation of the true regression model. (2 marks) … all or nothing

y i = β 0 + β1 (boxes ) + β 2 ( wt ) + β 3 ( Exp.) + ε i

c.

Which variables should be retained and which should be dropped? (2 marks) … -1 for each mistake Retained Boxes Weight

d.

Dropped Experience

Suppose that the manager believes that the shift (Day, Evening, or Night) influences the unloading time. Write out the new regression model. (2 marks) Adding two indicator variables. If used two dummies max. -1 -1 if numerical coefficients used -1 if no 0/1