A random sample of twenty employee records produces the following table. The sample is taken by selecting every 30th employee file from a cabinet containing 650 employee records, starting from the 10th file. In addition to salary (in thousands), we write down each employee's age, year of experience, gender (coded as a binary variable: 1 = male, 0 = female), and years of college education (coded 6 = master's degree, 4 = bachelor's degree, 2 = associate's degree, 0 = no college degree).
Profile of twenty employees of Ephemeral Products}
22 37 60 20 76 44 32 80 68 86 55 64 45 83 55 31 32 35 60 70
20 31 44 0 55 14 25 55 50 60 40 29 35 64 40 4 25 10 21 50
0 4 14 1 25 1 5 25 20 30 15 1 10 19 15 1 5 1 1 20
0 0 0 4 0 0 0 1 0 1 1 0 0 1 0 4 1 2 2 1
2 6 6 2 4 6 2 4 0 2 2 6 6 4
NAME SALARY AGE EXP GEND EDUC
Mary Brenda Alicia Tom 22 Nicole Bill 56 Gillian Bob Vivian Cecil Barney Jack 91 Wanda Sam Saundra Pete 37 Steve Fred 45 Dick 80 Lee
(5.a) Plot salary against sex. Does the plot indicate any difference of salary due to gender?
From the graph, male group has a slightly higher maximum salary and larger spread (range). But it is hard to tell if the salaries for the two groups are significantly different or not. In fact, 2-sample t test suggests no significant difference (T-Value = -1.27 P-Value = 0.110).
16
(5.b) Consider the regression model Salary??0??1GEND?eTest the hypothesis
H0:?1?0 against H1:?1?0 at 5% level of significance.
(5.1)
17
(5.c) Plot the residuals of the above analysis against age, experience and education variables respectively. What can you conclude from these plots?
18
Residuals (salary that can not be explained by gender) have positive relationship with age and experience and negative relationship with education. That means the residuals can further be explained by these three variables.
(5.d) Fit another regression model by including all the available variables:
Salary??0??1GEND??2EXP??3AGE??4EDUC?e
(5.2)
19
(5.e) Check the adequacy of this model.
R-squared is pretty hig. 99.8% of the variation in salary is explained by these variables.
20