lucene全文检索精华(2)

2019-08-31 12:51

article.getTitle(), Store.YES, Index.ANALYZED))。第三与第四个参数的意思为：

枚举类型 Store NO 枚举常量不存储属性的值存储属性的值不建立索引分词后建立索引说明 YES Index NO ANALYZED NOT_ANALYZED 不分词，把整个内容作为一个词建立索引说明：Store是影响搜索出的结果中是否有指定属性的原始内容。Index是影响是否可以从这个属性中查询（No），或是查询时可以查其中的某些词（ANALYZED），还是要把整个内容作为一个词进行查询（NOT_ANALYZED）。

2.4.2 在索引库中搜索的执行过程

在进行搜索时，先在词汇表中查找，得到符合条件的文档编号列表。再根据文档编号真正的去取出数据（Document）。如下图：

1. 把要查询字符串(查询条件)转为Query对象(这就像在Hibernate中使用HQL查询

时，也要先调用Session.createQuery(hql)转成Hibernate的Query对象一样)。把查询字符串转换成Query是使用QueryParser，或使用

MultiFieldQueryParser(查询字符串也要先经过Analyzer（分词器）。要求搜索时使用的Analyzer要与建立索引时使用的Analzyer要一致，否则可能搜不出正确的结果)。

2. 调用IndexSearcher.search()，进行查询，得到结果。此方法返回值为TopDocs，

是包含结果的多个信息的一个对象。其中有totalHits 代表决记录数，ScoreDoc的数组。ScoreDoc是代表一个结果的相关度得分与文档编号等信息的对象。 3. 取出要用到的数据列表。调用IndexSearcher.doc(scoreDoc.doc)以取出指定编号对应的Document数据。在分页时要用到：一次只取一页的数据。

3 lunece索引库操作CRUD

3.1 全文检索程序的工作流程

从上图可以看出，我们不仅要搜索，还要保证数据集合与索引库的一致性。所以对于全文检索功能的开发，要做的有两个方面：索引库管理（维护索引库中的数据）、在索引库中进行搜索。而Lucene就是操作索引库的工具。

3.2 使用lucene的API操作索引库

索引库是一个目录，里面是一些二进制文件，就如同数据库，所有的数据也是以文件的形式存在文件系统中的。我们不能直接操作这些二进制文件，而是使用Lucene提供的API完成相应的操作，就像操作数据库应使用SQL语句一样。

对索引库的操作可以分为两种：管理与查询。管理索引库使用IndexWriter，从索引库中查询使用IndexSearcher。Lucene的数据结构为Document与Field。Document代表一条数据，Field代表数据中的一个属性。一个Document中有多个Field，Field的值为String型，因为Lucene只处理文本。

我们只需要把在我们的程序中的对象转成Document，就可以交给Lucene管理了，搜索的结果中的数据列表也是Document的集合。

3.3 添加(C)操作

3.3.1 先准备pojo对象

public class Article {

private int id;//编号 private String title;//题目 private String author;//作者 private String content;//内容 private String link;//链接

3.3.2 执行添加操作

public void addIndex(Article article) throws IOException{

}

IndexWriter indexWriter=LuceneUtil.getIndexWriter(); Document doc=ArticleUtil.articleToDocument(article); indexWriter.addDocument(doc); indexWriter.close();

从上面可以看出我们需要自己定义一个LuceneUtil工具类，目地就是为了方便的获取IndexWriter和IndexSearcher，还需要一个ArticleUtil工具类，目地是为了方便的把对象转化为document和document转化为对象。

3.3.3 编写LuceneUtil工具类

public class LuceneUtil {

private static Directory directory=null;

private static IndexWriterConfig indexWriterConfig=null; private static Version matchVersion=null; private static Analyzer analyzer=null; static{

try {

//Constants.INDEX_URL是自己定义的常量,public static final

//String INDEX_URL=\

}

//得到indexWriter

public static IndexWriter getIndexWriter() throws IOException{ }

//得到indexSearcher

public static IndexSearcher getIndexSearcher() throws IOException{

IndexReader indexReader=DirectoryReader.open(directory); IndexSearcher indexSearcher=new IndexSearcher(indexReader); return indexSearcher;

IndexWriter indexWriter=new IndexWriter(directory, return indexWriter;

directory=FSDirectory.open(new File(Constants.INDEX_URL)); matchVersion=Version.LUCENE_44;

analyzer=new StandardAnalyzer(matchVersion);

indexWriterConfig=new IndexWriterConfig(matchVersion,

analyzer);

} catch (IOException e) { }

e.printStackTrace();

indexWriterConfig);

}

public static Version getMatchVersion() { }

public static Analyzer getAnalyzer() { }

return analyzer; return matchVersion;

3.3.4 编写ArticleUtil工具类

public class ArticleUtil { }

//把对象转化为document

public static Document articleToDocument(Article article){ }

//把document转为对象

public static Article documentToArticle(Document document){ }

Article article=new Article();

article.setId(Integer.parseInt(document.get(\))); article.setAuthor(document.get(\)); article.setTitle(document.get(\)); article.setLink(document.get(\)); article.setContent(document.get(\)); return article;

Document document=new Document();

document.add(new IntField(\, article.getId(), Store.YES)); document.add(new StringField(\, article.getAuthor(), document.add(new StringField(\, article.getLink(), document.add(new TextField(\, article.getTitle(), document.add(new TextField(\, article.getContent(),

return document;

Store.YES)); Store.YES)); Store.YES)); Store.YES));

共6页:

lucene全文检索精华(2).doc 将本文的Word文档下载到电脑下载失败或者文档不完整，请联系客服人员解决！

下载这篇word文档