http://www.dsi.uminho.pt/~pcortez/forestfires).
森林火灾:这是一个艰难的回归的任务,其目的是在葡萄牙东北部地区,利用气象数据和其他数据,预测森林火灾的过火面积,(详见:http://www.dsi.uminho PT / pcortez / forestfires)。
75. Function Finding: Cases collected mostly from investigations in physical science; intention is to evaluate function-finding algorithms
寻找功能:收集的情况下,大多是从在物理科学的调查;意图是评价函数发现算法
76. Gisette: GISETTE is a handwritten digit recognition problem. The problem is to separate the highly confusible digits '4' and '9'. This dataset is one of five datasets of the NIPS 2003 feature selection challenge.
Gisette:GISETTE是一个手写数字识别问题。问题是独立的高度confusible数字'4'和'9'。这个数据集是5 NIPS的2003年特征选择挑战的数据集之一。
77. Glass Identification: From USA Forensic Science Service; 6 types of glass; defined in terms of their oxide content (i.e. Na, Fe, K, etc)
玻璃鉴定:从美国法医科学服务; 6种玻璃;在他们的氧化物含量定义(即钠,铁,钾等)
78. Haberman's Survival: Dataset contains cases from study conducted on the survival of patients who had undergone surgery for breast cancer
哈伯曼的生存:DataSet包含谁经历了乳腺癌手术患者的生存所进行的研究情况
79. Hayes-Roth: Topic: human subjects study 海斯 - 罗斯:主题:人类受试者的研究
80. Heart Disease: 4 databases: Cleveland, Hungary, Switzerland, and the VA Long Beach
心脏病:4个数据库:克利夫兰,匈牙利,瑞士,和弗吉尼亚州的长滩
81. Hepatitis: From G.Gong: CMU; Mostly Boolean or numeric-valued attribute types; Includes cost data (donated by Peter Turney)
肝炎:从G.龚:债务工具中央结算系统;大多是布尔值或数字值的属性类型,包括成本数据(彼得特尼捐赠)
82. Hill-Valley: Each record represents 100 points on a two-dimensional graph. When plotted in order (from 1 through 100) as the Y co-ordinate, the points will create either a Hill (a ?bump? in the terrain) or a Valley (a ?dip? in
the terrain).
希尔谷:每个记录代表一个二维图形上100点。当策划,以统筹的Y(从1到100),积分将创建一个山(在凹凸的地形)或谷(浸在地形)。
83. Horse Colic: Well documented attributes; 368 instances with 28 attributes (continuous, discrete, and nominal); 30% missing values
马绞痛:有据可查的属性; 368 28属性(连续,离散的,标称值)的实例; 30%的缺失值
84. Housing: Taken from StatLib library
房屋:两者StatLib库
85. ICU: Data set prepared for the use of participants for the 1994 AAAI Spring Symposium on Artificial Intelligence in Medicine.
ICU的数据集,为1994年AAAI春季研讨会的与会者在医学上使用人工智能准备。
86. Image Segmentation: Image data described by high-level numeric-valued attributes, 7 classes
图像分割:由高层次的数字值属性描述的图像数据,7类
87. Insurance Company Benchmark (COIL 2000): This data set used in the CoIL 2000 Challenge contains information on customers of an insurance company. The data consists of 86 variables and includes product usage data and socio-demographic data
保险公司的基准(线圈2000年):使用该数据集在线圈2000挑战包含保险公司对客户的信息。该数据由86变数,包括产品使用的数据和社会人口数据
88. Internet Advertisements: This dataset represents a set of possible advertisements on Internet pages.
互联网广告:这个DataSet表示一组可能在互联网上的网页广告。
89. Internet Usage Data: This data contains general demographic information on internet users in 1997.
互联网应用的数据:该数据包含一般的互联网用户在1997年的人口统计信息。
90. Ionosphere: Classification of radar returns from the ionosphere
电离层:从电离层雷达回波分类
91. IPUMS Census Database: This data set contains unweighted PUMS census data from the Los Angeles and Long Beach areas for the years 1970, 1980, and 1990.
IPUMS普查数据库:该数据集包含未加权PUMS普查从洛杉矶和长滩地区1970年,1980年和1990年的数据。
92. Iris: Famous database; from Fisher, 1936
光圈:著名的数据库;从1936年费舍尔,
93. ISOLET: Goal: Predict which letter-name was spoken--a simple classification task.
ISOLET:目标:预测字母名称是口语 - 一个简单的分类任务。
94. Japanese Credit Screening: Includes domain theory (generated by talking to Japanese domain experts); data in Lisp
日本信用筛选:包括域理论(日本领域的专家交谈生成);在Lisp中的数据
95. Japanese Vowels: This dataset records 640 time series of 12 LPC cepstrum coefficients taken from nine male speakers.
日本元音:该数据集的记录640 12的LPC倒谱系系数从九男扬声器的时间序列。
96. KDD Cup 1998 Data: This is the data set used for The Second
International Knowledge Discovery and Data Mining Tools Competition, which was held i n conjunction with KDD-98
KDD杯1998年的数据:这是数据集的第二届国际知识发现和数据挖掘工具的竞争,这是在同时举行的KDD - 98
97. KDD Cup 1999 Data: This is the data set used for The Third International Knowledge Discovery and Data Mining Tools Competition, which was held in conjunction with KDD-99
KDD杯1999年的数据:这是数据集使用的第三次国际知识发现和数据挖掘工具的竞争,这是在同时举行的KDD - 99
98. Kinship: Relational dataset
亲属关系:关系数据集
99. Labor Relations: From Collective Bargaining Review
劳动关系:从集体谈判检讨
100. LED Display Domain: From Classification and Regression Trees book; We provide here 2 C programs for generating sample databases
LED显示域:从分类和回归树书,我们在这里提供2 C程序生成示例数据库
101. Lenses: Database for fitting contact lenses
镜头:装修隐形眼镜数据库
102. Letter Recognition: Database of character image features; try to identify the letter
信承认:人物形象特征的数据库;试图找出信
103. Libras Movement: The data set contains 15 classes of 24 instances
each. Each class references to a hand movement type in LIBRAS (Portuguese name 'L?ngua BRAsileira de Sinais', oficial brazilian signal language).
天秤座的运动:该数据集包含了15类24个实例。每个类的引用,在天秤座的人的手部动作类型(葡萄牙名“Lngua BRAsileira Sinais”,公报巴西信号语言)。
104. Liver Disorders: BUPA Medical Research Ltd. database donated by Richard S. Forsyth
肝脏疾病:保柏医疗研究公司数据库由理查德福塞斯捐赠
105. Localization Data for Person Activity: Data contains recordings of five people performing different activities. Each person wore four sensors (tags) while performing the same scenario five times.
人活动的本地化数据:数据包含五个执行不同的活动的人的录音。每个人穿的4个传感器(标签),同时执行相同的情况下的五倍。
106. Logic Theorist: All code for Logic Theorist
逻辑理论家:逻辑理论家的所有代码
107. Low Resolution Spectrometer: From IRAS data -- NASA Ames Research Center
低分辨率光谱仪:从红外天文卫星数据 - 美国国家航空航天局艾姆斯研究中心
108. Lung Cancer: Lung cancer data; no attribute definitions
肺癌:肺癌数据;没有属性定义
109. Lymphography: This lymphography domain was obtained from the University Medical Centre, Institute of Oncology, Ljubljana, Yugoslavia. (Restricted access)
淋巴造影:从大学医学中心,肿瘤研究所,南斯拉夫卢布尔雅那的这淋巴域。 (限制访问)
110. M. Tuberculosis Genes: Data giving characteristics of each ORF (potential gene) in the M. tuberculosis bacterium. Sequence, homology
(similarity to other genes) and structural information, and function (if known) are provided
结核分枝杆菌基因:给每个ORF在结核分枝杆菌的细菌特性(潜在的基因)的数据。序列,同源性(其他基因的相似性)和结构信息,和功能(如果已知)
111. Madelon: MADELON is an artificial dataset, which was part of the NIPS 2003 feature selection challenge. This is a two-class classification problem with continuous input variables. The difficulty is that the problem is multivariate and highly non-linear.
Madelon:MADELON是一个人造的数据集,这是对2003年的NIPS的特征选择挑战的一部分。这是一个连续的输入变量的两个类的分类问题。困难的是,问题是多元的和高度非线性。
112. MAGIC Gamma Telescope: Data are MC generated to simulate registration of high energy gamma particles in an atmospheric Cherenkov telescope
魔伽马望远镜:数据生成高能量的伽玛粒子来模拟大气切伦科夫望远镜登记MC
113. Mammographic Mass: Discrimination of benign and malignant
mammographic masses based on BI-RADS attributes and the patient's age.
乳腺质量:良性和恶性乳腺群众基于BI - RADS的属性和病人的年龄歧视。
114. Mechanical Analysis: Fault diagnosis problem of electromechanical devices; also PUMPS DATA SET is newer version with domain theory and results
力学分析:机电设备的故障诊断问题;水泵数据集与域的理论和成果是较新的版本
115. Meta-data: Meta-Data was used in order to give advice about which
classification method is appropriate for a particular dataset (taken from results of Statlog project).