58. | 5,079 1 | 59. | 8,129 3 | 60. | 4,296 1 | |-----------------| 61. | 5,799 4 | 62. | 4,499 1 | 63. | 3,995 1 | 64. | 12,990 2 | 65. | 3,895 1 | |-----------------| 66. | 3,798 1 | 67. | 5,899 4 | 68. | 3,748 1 | 69. | 5,719 4 | 70. | 7,140 3 | |-----------------| 71. | 5,397 4 | 72. | 4,697 1 | 73. | 6,850 3 | 74. | 11,995 2 | +-----------------+
. tabstat price mpg weight length, by(class2)
Summary statistics: mean by categories of: class2
class2 | price mpg weight length ---------+----------------------------------------
第 1类 | 4163.938 24.5625 2581.25 174.7813 第2类 | 12607.6 15 4041 209.1 第3类 | 8312.143 21 2931.429 188.5714 4 | 5548.88 19.72 3196.4 196.12 ---------+----------------------------------------
Total | 6165.257 21.2973 3019.459 187.9324 -------------------------------------------------- .
反过来做多元方差分析,检验分类是否有效。看均值差异情况。P显著小于0,分类成功。
. manova price mpg weight length=class2
Number of obs = 74
W = Wilks' lambda L = Lawley-Hotelling trace
P = Pillai's trace R = Roy's largest root
Source | Statisticdf F(df1, df2) = F Prob>F -----------+--------------------------------------------------
class2 | W 0.0549 3 12.0 177.6 29.52 0.0000 a
| P 1.2125 12.0 207.0 11.70 0.0000 a
| L 12.6572 12.0 197.0 69.26 0.0000 a
| R 12.3147 4.0 69.0 212.43 0.0000 u
|-------------------------------------------------- Residual | 70
-----------+-------------------------------------------------- Total | 73
--------------------------------------------------------------
e = exact, a = approximate, u = upper bound on F 使用中位数kmedians重整,分类情况发生改变
. clusterkmedians price mpg weight length, k(4) name(class3)
. list class2 class3
+-----------------+ | class2 class3 | |-----------------|
1. | 1 2 | 2. | 4 2 | 3. | 1 3 | 4. | 4 2 | 5. | 3 1 | |-----------------|
6. | 4 1 | 7. | 1 3 | 8. | 4 2 | 9. | 2 4 | 10. | 1 2 | |-----------------|
11. | 2 4 | 12. | 2 4 | 13. | 2 4 | 14. | 1 3 |
15. | 4 2 | |-----------------|
16. | 1 2 | 17. | 4 2 | 18. | 1 3 | 19. | 1 2 | 20. | 1 3 | |-----------------|
21. | 1 2 | 22. | 4 23. | 4 24. | 1 25. | 1 |-----------------|
26. | 2 27. | 2 28. | 2 29. | 1 30. | 4 |-----------------|
31. | 4 32. | 1 33. | 4 34. | 1 35. | 3 |-----------------|
36. | 4 37. | 4 38. | 4 39. | 1 40. | 1 |-----------------|
41. | 2 42. | 1 43. | 1 44. | 1 45. | 4 |-----------------|
46. | 1 47. | 4 48. | 4 49. | 4 50. | 4 |-----------------|
1 | 1 | 3 | 3 | 4 | 4 | 4 | 3 | 2 | 1 | 2 | 1 | 3 | 4 | 2 | 2 | 2 | 2 | 3 | 4 | 2 | 3 | 3 | 1 | 2 | 1 | 2 | 2 | 2 | 51. | 1 2 | 52. | 1 3 | 53. | 3 4 | 54. | 4 1 | 55. | 3 4 | |-----------------|
56. | 4 1 | 57. | 1 3 | 58. | 1 3 | 59. | 3 1 | 60. | 1 3 | |-----------------|
61. | 4 1 | 62. | 1 3 | 63. | 1 3 | 64. | 2 4 | 65. | 1 3 | |-----------------|
66. | 1 3 | 67. | 4 1 | 68. | 1 3 | 69. | 4 1 | 70. | 3 1 | |-----------------|
71. | 4 1 | 72. | 1 3 | 73. | 3 1 | 74. | 2 4 | +-----------------+ .
停止K聚类分析,根据设定的规则。 Pseudo F统计量calinaki
. cluster stop class3, rule(calinski)
+---------------------------+
| | Calinski/ | | Number of | Harabasz | | clusters | pseudo-F | |-------------+-------------|
| 4 | 151.37 | +---------------------------+
在分类图中的个体赋予类别界定线
. clusteraveragelinkage price mpg weight length cluster name: _clus_1
. cluster generate clus5= cut(3500), name( _clus_1)
. list clus5
+-------+ | clus5 | |-------| 1. | 1 | 2. | 1 | 3. | 1 | 4. | 1 | 5. | 2 | |-------| 6. | 1 | 7. | 1 | 8. | 1 | 9. | 2 | 10. | 1 | |-------| 11. | 2 | 12. | 3 | 13. | 3 | 14. | 1 | 15. | 1 | |-------| 16. | 1 | 17. | 1 | 18. | 1 | 19. | 1 | 20. | 1 | |-------| 21. | 1 | 22. | 1 | 23. | 1 | 24. | 1 | 25. | 1 | |-------| 26. | 2 | 27. | 3 | 28. | 3 | 29. | 1 |