正则表达式最佳教程3(6)

2019-02-15 11:13

Pttern.CASE_INSENSITIVE );

编译并运行这个测试用具，会得出下面的结果：

Enter your regex: dog

Enter input string to search: DoGDOg

I found the text \starting at index 0 and ending at index 3.

I found the text \starting at index 3 and ending at index 6.

正如你所看到的，不管是否大小写，字符串字面上是“dog”的都产生了匹配。使用多个标志来编译一个模式，使用按位或操作符“|”分隔各个标志。为了更清晰地说明，下面的示例代码使用硬编码（hardcode）的方式，来取代控制台中的读取：

001

pattern = Pattern.compile(\Pattern.UNIX_LINES);

也可以使用一个 int 类型的变量来代替：

001 002

final int flags = Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE; Pattern pattern = Pattern.compile(\

8.2 内嵌标志表达式返回目录

使用内嵌标志表达式（embedded flag expressions）也可以启用不同的标志。对于两个参数的 compile 方法，内嵌标志表达式是可选的，因为它在自身的正则表达式中被指定了。下面的例子使用最初的测试用具（RegexTestHarness.java），使用内嵌标志表达式(?i)来启用不区分大小写的匹配。

Enter your regex: (?i)foo

Enter input string to search: FOOfooFoOfoO

I found the text \starting at index 0 and ending at index 3.

I found the text \starting at index 3 and ending at index 6.

I found the text \starting at index 6 and ending at index 9.

I found the text \starting at index 9 and ending at index 12.

所有匹配无关大小写都一次次地成功了。

内嵌标志表达式所对应 Pattern 的公用的访问字段表示如下表：

常量

Pattern.CANON_EQ Pattern.COMMENTS Pattern.MULTILINE Pattern.DOTALL Pattern.LITERAL Pattern.UNICODE_CASE Pattern.UNIX_LINES

等价的内嵌标志表达式没有

Pattern.CASE_INSENSITIVE (?i)

(?x)

(?m) (?s)

没有

(?u) (?d)

8.3 使用 matches(String, CharSequence) 方法返回目录

Pattern 类定义了一个方便的 matches 方法，用于快速地检查模式是否表示给定的输入字符串。与使用所有的公共静态方法一样，应该通过它的类名来调用 matches 方法，诸如 Pattern.matches(\。这个例子中，方法返回 true，这是由于数字“1”匹配了正则表达式\\d。 8.4 使用 split(String) 方法返回目录

split 方法是一个重要的工具，用于收集依赖于被匹配的模式任一边的文本。如下面的 SplitDemo.java 所示，split 方法能从“one:two:three:four:five”字符串中解析出“one two three four five”单词：

001 002 003 004 005 006 007 008 009 010 011 012 013 014 015

import java.util.regex.Pattern;

public class SplitDemo {

private static final String REGEX = \

private static final String INPUT = \

public static void main(String[] args) { Pattern p = Pattern.compile(REGEX); String[] items = p.split(INPUT); for(String s : items) { System.out.println(s); } } }

输出：

one

two three four five

简而言之，已经使用冒号（:）取代了复杂的正则表达式匹配字符串文字。以后仍会使用 Pattern 和 Matcher 对象，也能使用 split 得到位于任意正则表达式各边的文本。下面的 SplitDemo2.java 是个一样的例子，使用数字作为 split 的参数：

001 002 003 004 005 006 007 008 009 010 011 012 013 014 015

import java.util.regex.Pattern;

public class SplitDemo2 {

private static final String REGEX = \

private static final String INPUT = \

public static void main(String[] args) { Pattern p = Pattern.compile(REGEX); String[] items = p.split(INPUT); for(String s : items) { System.out.println(s); } } }

输出：

one

two three four five

8.5 其他有用的方法返回目录

你可以从下面的方法中找到比较好用的方法：

public static String quote(String s)：返回指定字符串字面模式的字符串。此方法会产生一个字符串，能被用于构建一个与字符串 s 匹配的 Pattern，好像它是一个字面上的模式。输入序列中的元字符和转义序列将没有特殊的意义了。

public String toString()：返回这个模式的字符串表现形式。这是一个编译过的模式中的正则表达式。 8.6 在 java.lang.String 中等价的 Pattern 方法返回目录

java.lang.String 通过模拟 java.util.regex.Pattern 行为的几个方法，也可以支持正则表达式。方便起见，下面主要摘录了出现在 API 关键的方法。

public boolean matches(String regex)：告知字符串是否匹配给定的正则表达式。调用

str.matches(regex)方法所产生的结果与作为表达式的 Pattern.matches(regex, str)的结果是完全一致。 public String[] split(String regex, int limit)：依照匹配给定的正则表达式来拆分字符串。调用

str.split(regex, n)方法所产生的结果与作为表达式的 Pattern.compile(regex).split(str, n) 的结果完全一致。 public String[] split(String regex)：依照匹配给定的正则表达式来拆分字符串。这个方法与调用两个参数的 split 方法是相同的，第一个参数使用给定的表达式，第二个参数限制为 0。在结果数组中不包括尾部的空字符串。

还有一个替换方法，把一个 CharSequence 替换成另外一个：

public String replace(CharSequence target,CharSequence replacement)：将字符串中每一个匹配替

[9]

换匹配字面目标序列的子字符串，替换成指定的字面替换序列。这个替换从字符串的开始处理直至结束，例如，把字符串“aaa”中的“aa”替换成“b”，结果是“ba”，而不是“ab”。 9 Matcher 类的方法返回目录

在这一节中来看看 Matcher 类中其他一些有用的方法。方便起见，下面列出的方法是按照功能来分组的。索引方法

索引方法（index methods）提供了一些正好在输入字符串中发现匹配的索引值： public int start()：返回之前匹配的开始索引。

public int start(int group)：返回之前匹配操作中通过给定组所捕获序列的开始索引。 public int end(): 返回最后匹配字符后的偏移量。

public int end(int group): 返回之前匹配操作中通过给定组所捕获序列的最后字符之后的偏移量。研究方法

研究方法（study methods）回顾输入的字符串，并且返回一个用于指示是否找到模式的布尔值。 public boolean lookingAt(): 尝试从区域开头处开始，输入序列与该模式匹配。 public boolean find(): 尝试地寻找输入序列中，匹配模式的下一个子序列。

public boolean find(int start): 重置匹配器，然后从指定的索引处开始，尝试地寻找输入序列中，匹配模式的下一个子序列。

public boolean matches(): 尝试将整个区域与模式进行匹配替换方法

替换方法（replacement methods）用于在输入的字符串中替换文本有用处的方法。

public Matcher appendReplacement(StringBuffer sb, String replacement)：实现非结尾处的增加和替换操作。

public StringBuffer appendTail(StringBuffer sb)：实现结尾处的增加和替换操作。

public String replaceAll(String replacement)：使用给定的替换字符串来替换输入序列中匹配模式的每一个子序列。

public String replaceFirst(String replacement)：使用给定的替换字符串来替换输入序列中匹配模式的第一个子序列。

public static String quoteReplacement(String s)：返回指定字符串的字面值来替换字符串。这个方法会生成一个字符串，用作 Matcher 的 appendReplacement 方法中的字面值替换 s。所产生的字符串将与作为字面值序列的 s 中的字符序列匹配。斜线（\\）和美元符号（$）将不再有特殊意义了。 9.1 使用 start 和 end 方法返回目录

示例程序 MatcherDemo.java 用于计算输入序列中单词“dog”的出现次数。

001 002 003 004 005 006 007 008 009 010

共8页:

正则表达式最佳教程3(6).doc 将本文的Word文档下载到电脑下载失败或者文档不完整，请联系客服人员解决！

下载这篇word文档