Structural Knowledge Discovery Used to Analyze Earthquake Ac

2021-04-06 00:38

The Subdue structural discovery system is being used as the Data Mining tool to study the "Orizaba Fault " located in Mexico, as part of a research project of the geologist Dr. Burke Burkart. We analyze the information of the Earthquake Database

Structural Knowledge Discovery Used to Analyze Earthquake Activity

Jesus A. Gonzalez, Lawrence B. Holder and Diane J. Cook

Department of Computer Science and Engineering

University of Texas at ArlingtonBox 19015, Arlington, TX 76019-0015{gonzalez,holder,cook@cse.uta.edu}

Abstract

The Subdue structural discovery system is being used as theData Mining tool to study the "Orizaba Fault" located inMexico, as part of a research project of the geologist Dr.Burke Burkart. We analyze the information of theEarthquake Database to discover if the earthquake activityin the area is related to the fault. We experimented withdifferent samples of data mainly using two heuristics toguide Subdue through the substructure discovery process.We also added some spatio-temporal information asbackground knowledge. The results show how Subdue cansuccessfully be used as a Data Mining tool in Real WorldDomains.

Introduction

The advancement of technology has allowed not onlythe automation of complex processes but also theaccumulation of large amounts of process information indatabases. But having the information is useless if we donot take advantage and learn from it by extractingknowledge that helps to improve a process or identifying apossible failure manifested in the stored information.However this is a difficult task to achieve using standardtools due to the large amount and complexity of data.

That is the reason why different approaches in the fieldof Knowledge Discovery (Fayyad et al. 1996) have beendeveloped to extract hidden information from thosedatabases. In this project we use the Knowledge Discoveryprocess (Fayyad et al. 1996) and a specific Data Miningtool applied to a real-world domain problem. The DataMining tool is the Subdue program, and the domain is theEarthquake database that consists of reports ofearthquakes. In the case of this domain we worked with ageology expert, Dr. Burke Burkart who helped us toanalyze the results and to guide the geology-relatedresearch.

We experimented with different samples of data mainlyusing two heuristics to guide Subdue through thesubstructure discovery process. We also added somegeographical and time knowledge to connect earthquakesthat occurred close to each other in time and distance. Theresults show how Subdue was able to effectively findpatterns with a logical interpretation, and how it can beused as a research tool in the geological domain.

Copyright © 2000, American Association for Artificial Intelligence(). All rights reserved.

Substructure Discovery Using Subdue

Subdue (Cook, Holder and Djoko 1995) is a DataMining tool that achieves the task of clustering using analgorithm categorized as an example based and relationallearning method. This tool was first developed in 1990 andhas been expanded and optimized to generate better results.It is a general tool that can be applied to any domain thatcan be represented as a graph. Subdue has beensuccessfully used on several domains like CAD circuitanalysis, chemical compound analysis, and scene analysis(Cook, Holder and Djoko 1996, Cook, Holder and Djoko1995, Cook and Holder 1994, Chittimoori, Gonzalez andHolder 1999, and Djoko, Cook and Holder 1995).

Subdue implements two model evaluation criteria as ameans to decide which patterns are going to be chosen asimportant knowledge or structures. The first modelevaluation method is called “Minimum Encoding” that is atechnique derived from the minimum description lengthprinciple (Cook and Holder 1994) and chooses as bestsubstructures those that minimize the description lengthmetric that is the length in number of bits of the graphrepresentation. The number of bits is calculated based onthe size of the adjacency matrix representation of thegraph. According to this, the best substructure is the onethat minimizes I(S) + I(G|S), where I(S) is the number ofbits required to describe substructure S, and I(G|S) is thenumber of bits required to describe graph G after beingcompressed by substructure S. The second method choosesthe substructures according to how well they compress thegraph in terms of its size in number of vertices and edges.Another method used consists of finding largesubstructures in spite of their low number of instances. The main discovery algorithm is a computationallyconstrained beam search. The algorithm begins with thesubstructure matching a single vertex in the graph. Eachiteration the algorithm selects the best substructure andincrementally expands the instances of the substructure.The algorithm searches for the best substructure until allpossible substructures have been considered or the totalamount of computation exceeds a given limit. Evaluationof each substructure is determined by how well thesubstructure compresses the input graph according to theheuristic being used (MDL or Graph size). The bestsubstructure found by Subdue can be used to compress theinput graph, which can then be input to another iteration ofSubdue. After several iterations, Subdue builds a


Structural Knowledge Discovery Used to Analyze Earthquake Ac.doc 将本文的Word文档下载到电脑 下载失败或者文档不完整,请联系客服人员解决!

下一篇:投标人财务承诺书

相关阅读
本类排行
× 注册会员免费下载(下载后可以自由复制和排版)

马上注册会员

注:下载文档有可能“只有目录或者内容不全”等情况,请下载之前注意辨别,如果您已付费且无法下载或内容有问题,请联系我们协助你处理。
微信: QQ: