Assignment #3
1. Use the data of 25 countries in Question 7 of Chapter 6 (page 263), to answer the following questions.
(1.a) Construct the scatter plots of Imports versus GDP and describe the pattern of relationship.
Leverage point
There is a moderate positive linear relationship between IMPORTS and GDP except for an outlier. Remarks: outlier is an unusual observation in terms of y (output), an leverage point is an unusual observation in terms of x (input).
(1.b) Run the regression using Imports as the response variable and GDP as the input variable, save the
residuals. Interpret the regression results. Test the normality assumption of residuals at 5% level of significance.
When GDP increases by 1 unit (billion US$), imports increases by 0.0906 unit(billions US$) on average. This model does not fit the data very well due to a low R-square (only 15.8%)
1
All three normality tests reject the null hypothesis of normality, i.e., the normality assumption is violated. In practice, you just need to report one of them. The normal score plot also shows evidence of deviation from a straight line. That leads to the conclusion that normal assumption is violated.
(1.c) Omit the data of USA and Netherlands and run the regression again . Check the residual assumptions
and interpret the results.
2
In this case, when GDP increases by 1 unit(billion US$) , IMPORTS increase by 0.112 unit (billions US$) on average.
Again, all three normality tests reject the null hypothesis of normality, i.e., the normality assumption is violated. Normal assumption is still violated.
(1.d) Create the log-transformed Imports and GDP for the data omitting USA and Netherlands.
(1.f) Run the regression with log-import as response and log-GDP as input. Interpret the results and check the
normality assumption of residuals.
3
R-Sq is further increased after the log-transformation. Residuals look more normal and random.
4
All three normality tests fail to reject the null hypothesis of normality, i.e., the normality assumption is
satisfied. Residuals seem to follow a normal distribution.
(1.g) Which of the results in (1.b), (1.c) and (1.g) best describe the data? Explain.
Model 1.g describes the data the best with the highest R-squared. The residuals look normal, random and stable.
2. Use the data in the file ‘data-Q2.xls’.
(2.a) Run a regression model with violent crime rate as the response variable and GDP, Consumption, private
investment, Fixed investment, Government expenditure, State and local expenditures, Unemployment rate and Total poverty rate as input variables. Save Residuals and Durbin-Watson (DW) statistics.
DW=1.107
5