Regression Analysis of Cigarette Data

This dataset was taken from 25 brands of domestic cigarettes and includes measurements of their weight and their tar, nicotine, and carbon monoxide (CO) content.

After obtaining the data through the Internet (see source at end of page), I pasted it into an Excel worksheet and created scatterplots of tar vs. carbon monoxide and of nicotine vs. carbon monoxide. To view this file, click here: Download Excel 5.0 File of Cigarette Data. You can alsodownload Excel 3.0 File of the same data.

After you have downloaded the Excel file, you can either print a copy or keep it on the desktop. If you keep it on the desktop, you will have to flip back and forth between it and the web browser to answer the following questions, but you can use the functions of Excel to perform some of the calculations below.



Lesson:

(If you do not understand an underlined term,
then click on that word to see its definition.)



Questions 1-8 will deal with level of tar (X) versus carbon monoxide (Y):


1. What is the mean amount of tar in the cigarettes? the mean amount of carbon monoxide emitted?



2. What is the variance and standard deviation of X and of Y?



3. What is the covariance of X and Y?



4. The slope of the regression line can be found by dividing the covariance of X and Y by the variance of X. Find the slope:



5. The y-intercept of the regression line can be found by using X mean and Y mean along with the slope of the line: y intercept = Y mean - slope * X mean. Find the y-intercept in this case:



6. Combining your answers to #4 and #5, write the final regression equation that best describes the relationship between tar and carbon monoxide content of a cigarette:



7. Based on your regression equation, how much carbon monoxide should be emitted from a cigarette with 10 mg of tar? with 20 mg of tar?



8. Find the correlation coefficient (r) of tar and carbon monoxide:





Questions 9-16 will deal with level of nicotine (X) versus carbon monoxide (Y).
(Some of your answers for carbon monoxide will be the same as above.):


9. What is the mean amount of nicotine in the cigarettes? the mean amount of carbon monoxide emitted?



10. What is the variance and standard deviation of X and of Y?



11. What is the covariance of X and Y?



12. Find the slope of the regression line:



13. Find the y-intercept of the regression line:



14. Write the final regression equation that best describes the relationship between nicotine and carbon monoxide content of a cigarette:



15. Based on your regression equation, how much carbon monoxide should be emitted from a cigarette with 1 mg of nicotine? with 2 mg of nicotine?



16. Find the correlation coefficient (r) of nicotine and carbon monoxide:





Comparison of tar and nicotine's relationship to carbon monoxide:

17. Based on the correlation coefficients of each pair of variables, which seems to have a greater effect on the amount of carbon monoxide emitted from a cigarette: tar or nicotine?



18. From the data provided and your analysis of it, which brand of cigarette would you consider the "least harmful" to one's health? Explain your choice.






To see the regression lines actually graphed on to the scatterplots and to check your calculations from above you can download the answer version of this Excel file. Click here to Check your Answers.



This set of data was downloaded through the Internet from the following source: gopher://jse.stat.ncsu.edu/11/jse under the folder "JSE Dataset Archive". The data presented there is taken from Mendenhall and Sincich (1992) and is a subset of the data produced by the Federal Trade Commission. It was submitted by Lauren McIntyre, Department of Statistics, North Carolina State University.