ЧÂʱȽϣºAprioriÐè¶à´ÎɨÃèÊý¾Ý¿â¶øFPÔö³¤½¨Á¢FPÊ÷Ö»ÐèÒ»´ÎµÄɨÃè¡£ÔÚAprioriËã·¨ÖвúÉúºòÑ¡Êǰº¹óµÄ£¨ÓÉÓÚÁª½Ó£©£¬¶øFPÔö³¤²»²úÉúÈκκòÑ¡¡£
(b)ÁоÙËùÓÐÓëÏÂÃæµÄÔª¹æÔòÆ¥ÅäµÄÇ¿¹ØÁª¹æÔò£¨¸ø³öÖ§³Ö¶ÈSºÍÖÃÐŶÈC£©,ÆäÖУ¬XÊÇ´ú±í¹Ë¿ÍµÄ±äÁ¿£¬itemiÊDZíʾÏîµÄ±äÁ¿£¨È磺¡°A¡±¡¢¡°B¡±µÈ£©£º
´ð£º k,o e [0.6,1]
e,o k [0.6,1]
5.5.Êý¾Ý¿âÓÐ4¸öÊÂÎñ£¬Éèmin_sup =60%, min_conf=80%
£¨a£©ÔÚitem_categoryÁ£¶È£¨ÀýÈ磬itemi ¿ÉÒÔÊÇ¡°Milk¡±£©£¬¶ÔÓÚÏÂÃæµÄ¹æÔòÄ£°å
¶Ô×î´óµÄk,ÁгöƵ·±kÏ°üº¬×î´óµÄkµÄƵ·±kÏµÄËùÓÐÇ¿¹ØÁª¹æÔò£¨°üÀ¨ËüÃǵÄÖ§³Ö¶ÈSºÍÖÃÐŶÈc£©. (b)ÔÚ Á£¶È£¨ÀýÈ磺itemi ¿ÉÒÔÊÇ¡°Sunset-Milk¡±£©¶ÔÓÚÏÂÃæµÄ¹æÔòÄ£°å
¶Ô×î´óµÄk£¬ÁгöƵ·±kÏ£¨µ«²»Êä³öÈκιæÔò£©¡£
5.10 ¼Ù¶¨ÃèÊöBigUniversity´óѧÉúµÄÊý¾Ý¹ØÏµÒÑ·º»¯Îª±í5-13µÄ¹ãÒå¹ØÏµR.(ÌâÄ¿¼ûP179)
£¨a£©»³östatus,major,age,nationalityµÄ¸ÅÄî·Ö²ã ѧÉú¿ÉÒÔÇáËɵع´ÀÕ³öÏàÓ¦µÄ¸ÅÄî²ã´Î¡£
(b)дһ¸ö³ÌÐò£¬¶ÔËùÓвãʹÓÃÒ»ÖµÄÖ§³Ö¶È£¬Ïê¼ûP179
.(c)ʹÓò㽻²æµ¥Ïî¹ýÂË£¬Ïê¼ûP179
5.14 ÏÂÃæµÄÏàÒÀ±í»ã×ÜÁ˳¬¼¶Êг¡µÄÊÂÎñÊý¾Ý¡£ÆäÖУ¬hot dogs±íʾ°üº¬Èȹ·µÄÊÂÎñ£¬hot dogs±íʾ²»°üº¬Èȹ·µÄÊÂÎñ£¬hamburgers±íʾ°üº¬ºº±¤°üµÄÊÂÎñ£¬hamburgers±íʾ²»°üº¬ºº±¤°üµÄÊÂÎñ£¬
£¨a£©¼Ù¶¨ÍÚ¾ò³öÁ˹ØÁª¹æÔò¸Ã¹ØÁª¹æÔòÊÇÇ¿¹æÔòÂð£¿
´ð£º¸ù¾Ý¹æÔò£¬ support = 2000/5000 = 40%£¬ confidence = 2000/3000 = 66.7%. ¸Ã¹ØÁª¹æÔòÊÇÇ¿¹æÔò.
¡£¸ø¶¨×îС֧³Ö¶È·§Öµ25%£¬×îСÖÃÐŶȷ§Öµ50%£¬
£¨b£©¸ù¾Ý¸ø¶¨µÄÊý¾Ý£¬Âò hot dogs¶ÀÁ¢ÓÚÂòhumburgersÂð£¿Èç¹û²»ÊÇ£¬¶þÕßÖ®¼ä´æÔÚºÎÖÖÏà¹ØÁªÏµ¡£ ´ð£ºcorr{hotdog;hamburger} = P({hot dog, hamburger})/(P({hot dog}) P({hamburger})=0.4/(0.5 ¡Á 0.6) =1.33 > 1. ËùÒÔ£¬Âò hot dogs²»ÊǶÀÁ¢ÓÚÂòhumburgers¡£Á½Õß´æÔÚÕýÏà¹Ø¹ØÏµ 6.1 ¼òÊö¾ö²ßÊ÷·ÖÀàµÄÖ÷Òª²½Öè¡£
6.6 ¸ø¶¨Ò»¸ö¾ßÓÐ50¸öÊôÐÔ£¨Ã¿¸öÊôÐÔ°üº¬100¸ö²»Í¬Öµ£©µÄ5GBµÄÊý¾Ý¼¯£¬¶øÄãµĄ̈ʽ»úÓÐ512MÄÚ´æ¡£¼òÊö¶ÔÕâÖÖ´óÐÍÊý¾Ý¼¯¹¹Ôì¾ö²ßÊ÷µÄÒ»ÖÖÓÐЧËã·¨¡£Í¨¹ý´ÖÂԵؼÆËã»úÖ÷´æµÄʹÓÃ˵Ã÷ÄãµÄ´ð°¸ÊÇÕýÈ·µÄ¡£
We will use the RainForest algorithm for this problem. Assume there are C class labels. The most memory required will be
for AVC-set for the root of the tree. To compute the AVC-set for the root node, we scan the database once and construct the AVC-list for each of the 50 attributes. The size of each AVC-list is 100¡ÁC. The total size of the AVC-set is then 100¡Á C
¡Á50, which will easily fit into 512MB of memory for a reasonable C. The computation of other AVC-sets is done in a
similar way but they will be smaller because there will be less attributes available. To reduce the number of scans we can compute the AVC-set for nodes at the same level of the tree in parallel. With such small AVC-sets per node, we can probably fit the level in memory.
Õâ¸öÎÊÌâÎÒÃǽ«Ê¹ÓÃÓêÁÖËã·¨¡£¼ÙÉèÓÐCÀà±êÇ©¡£×îÐèÒªµÄÄڴ潫ÊÇavc-setΪ¸ùµÄÊ÷¡£¼ÆËãavc-setµÄ¸ù½Úµã£¬ÎÒÃÇɨÃèÒ»´ÎÊý¾Ý¿â£¬¹¹½¨avc-listÿ50¸öÊôÐÔ¡£Ã¿Ò»¸öavc-listµÄ³ß´çÊÇ100¡ÁC£¬avc-setµÄ×Ü´óСÊÇ100¡ÁC¡Á50£¬¶ÔÓÚºÏÀíµÄC½«ºÜÈÝÒ×ÊÊÓ¦512 MBÄڴ棬¼ÆËãÆäËûavc-setsÒ²ÊÇʹÓÃÀàËÆµÄ·½·¨£¬µ«ËûÃǽ«½ÏС£¬ÒòΪºÜÉÙÊôÐÔ¿ÉÓá£ÔÚ²¢ÐмÆËãʱ£¬ÎÒÃÇ¿ÉÒÔͨ¹ý¼ÆËãavc-set½ÚµãÀ´¼õÉÙͬһˮƽÉϵÄɨÃè´ÎÊý£¬Ê¹ÓÃÕâÖÖÿ½ÚµãСavc-setsµÄ·½·¨£¬ÎÒÃÇ»òÐí¿ÉÒÔÊÊÓ¦ÄÚ´æµÄˮƽ¡£
6.11ϱíÓɹÍÔ±Êý¾Ý¿âµÄѵÁ·Êý¾Ý×é³É¡£Êý¾ÝÒÑ·º»¯¡£ÀýÈ磺age ¡°31...35¡±±íʾÄêÁäÔÚ31-35Ö®¼ä¡£¶ÔÓÚ¸ø¶¨µÄÐУ¬count±íʾdepartment,status,ageºÍsalaryÔÚ¸ÃÐоßÓиø¶¨ÖµµÄÔª×éÊý¡£Éèstatus ÊÇÀà±êºÅÊôÐÔ¡£
£¨a£©ÈçºÎÐ޸Ļù±¾¾ö²ßÊ÷Ëã·¨£¬ÒԱ㿼ÂÇÿ¸ö¹ãÒåÊý¾ÝÔª×飨¼´Ã¿Ò»ÐУ©µÄcount? (b)ʹÓÃÐ޸ĵÄËã·¨£¬¹¹Ôì¸ø¶¨Êý¾ÝµÄ¾ö²ßÊ÷¡£
(c)¸ø¶¨Ò»¸öÊý¾ÝÔª×飬ËüÔÚÊôÐÔdepartment,ageºÍsalaryµÄÖµ·Ö±ðΪ¡°systems¡±,¡°26..30¡±,ºÍ¡°46K.. 50K¡±¡£¸ÃÔª×éstatusµÄÆÓËØ±´Ò¶Ë¹·ÖÀàÊÇʲô£¿
£¨d£©Îª¸ø¶¨µÄÊý¾ÝÉè¼ÆÒ»¸ö¶à²ãǰÀ¡Éñ¾ÍøÂç¡£±ê¼ÇÊäÈëºÍÊä³ö²ã½Úµã¡£
£¨e£©Ê¹ÓÃÉÏÃæµÃµ½µÄ¶à²ãǰÀ¡Éñ¾ÍøÂ磬¸ø¶¨ÑµÁ·ÊµÀý£¨sales,senior,31..35,46K..50K£©,¸ø³öºóÏò´«²¥Ëã·¨Ò»´Îµü´úµÄÈ¨ÖØÖµ¡£Ö¸³öÄãʹÓõijõÊ¼È¨ÖØºÍÆ«ÒÐÒÔ¼°Ñ§Ï°ÂÊ¡£
6.12Ö§³ÖÏòÁ¿»ú£¨SVM£©ÊÇÒ»ÖÖ¾ßÓиß׼ȷÂʵķÖÀà·½·¨¡£È»¶ø£¬ÔÚʹÓôóÐÍÊý¾ÝÔª×鼯½øÐÐѵÁ·Ê±£¬SVMµÄ´¦ÀíËٶȺÜÂý¡£ÌÖÂÛÈçºÎ¿Ë·þÕâÒ»À§ÄÑ£¬²¢Îª´óÐÍÊý¾Ý¼¯ÓÐЧµÄSVMËã·¨¡£ 7.1¼òµ¥µØÃèÊöÈçºÎ¼ÆËãÓÉÈçÏÂÀàÐ͵ıäÁ¿ÃèÊöµÄ¶ÔÏó¼äµÄÏàÒì¶È£º £¨a£©ÊýÖµ£¨Çø¼ä±ê¶È£©±äÁ¿ (b)·Ç¶Ô³ÆµÄ¶þÔª±äÁ¿ £¨c£©·ÖÀà±äÁ¿ £¨d£©±ÈÀý±ê¶È±äÁ¿ £¨e£©·ÇÊý¾Ý΢Á¿¶ÔÏó
7.2¸ø¶¨ÄêÁä±äÁ¿µÄÈçϲâÁ¿Öµ£º18; 22; 25; 42; 28; 43; 33; 35; 56; 28;ÓÃÈçÏµķ½·¨¶Ô¸Ã±äÁ¿±ê×¼»¯
£¨a£©¼ÆËãÁ½¸ö¶ÔÏóÖ®¼äµÄÅ·¼¸ÀïµÃ¾àÀë £¨b£©¼ÆËãÁ½¸ö¶ÔÏóÖ®¼äµÄÂü¹þ¶Ù¾àÀë
(c) ¼ÆËãÁ½¸ö¶ÔÏóÖ®¼äµÄãɿɷò˹»ù¾àÀ룬ÓÃp=3
7.6 ¼ÙÉèÊý¾ÝÍÚ¾òµÄÈÎÎñÊǽ«Èçϵİ˸öµã(Óã¨x,y£©´ú±íλÖÃ)¾ÛÀàΪÈý¸ö´Ø¡£
A1(2; 10);A2(2; 5);A3(8; 4);B1(5; 8);B2(7; 5);B3(6; 4);C1(1; 2);C2(4; 9):
¾àÀ뺯ÊýÊÇÅ·¼¸ÀïµÃ¾àÀë¡£¼ÙÉè³õʼÎÒÃÇÑ¡ÔñA1, B1,ºÍ C1·Ö±ðΪÿ¸ö´ØµÄÖÐÐÄ£¬ÓÃk¾ùÖµËã·¨Ö»¸ø³ö (a) ÔÚµÚÒ»ÂÖÖ´ÐкóµÄÈý¸ö´ØÖÐÐÄ (b) ×îºóµÄÈý¸ö´Ø