Tcl脚本语言教程 - 图文(5)

2019-06-17 12:57

在正规表达式中,一些字符具有特殊的含义,下表一一列出,并给予了解释。 意义 匹配任意单个字符 表示从头进行匹配 表示从末尾进行匹配 匹配字符x,这可以抑制字符x的含义 匹配字符集合chars中给出的任意字符,如果chars中的第一个字符是^,表示匹配任意不在chars中的字符,chars的表示方法支持a-z之类的表示。 字符 . ^ $ \\x [chars] (regexp) * + ? 把regexp作为一个单项进行匹配 对*前面的项进行0次或多次匹配 对+前面的项进行1次或多次匹配 对?前面的项进行0次或1次匹配 regexp1|regexp2 匹配regexp1或regexp2中的一项 下面的一个例子是从《Tcl and Tk ToolKit》中摘下来的,下面进行说明: ^((0x)?[0-9a-fA-F]+|[0-9]+)$ 这个正规表达式匹配任何十六进制或十进制的整数。 两个正规表达式以|分开(0x)?[0-9a-fA-F]+和[0-9]+,表示可以匹配其中的任何一个,事实上前者匹配十六进制,后者匹配的十进制。 ^表示必须从头进行匹配,从而上述正规表达式不匹配jk12之类不是以0x或数字开头的串。 $表示必须从末尾开始匹配,从而上述正规表达式不匹配12jk之类不是数字或a-fA-F结尾的串。 下面以(0x)?[0-9a-fA-F]+ 进行说明,(0x)表示0x一起作为一项,?表示前一项(0x)可以出现0次或多次,[0-9a-fA-F]表示可以是任意0到9之间的单个数字或a到f或A到F之间的单个字母,+表示象前面那样的单个数字或字母可以重复出现一次或多次。 % regexp {^((0x)?[0-9a-fA-F]+|[0-9]+)$} ab 1 % regexp {^((0x)?[0-9a-fA-F]+|[0-9]+)$} 0xabcd 1 % regexp {^((0x)?[0-9a-fA-F]+|[0-9]+)$} 12345 1 % regexp {^((0x)?[0-9a-fA-F]+|[0-9]+)$} 123j 0 21 如果regexp命令后面有参数matchVar和subMatchVar,则所有的参数被当作变量名,如果变量不存在,就会被生成。 regexp把匹配整个正规表达式的子字符串赋给第一个变量,匹配正规表达式的最左边的子表达式的子字符串赋给第二个变量,依次类推,例如: % regexp { ([0-9]+) *([a-z]+)} \ 1 % puts \ 100 apples ,100,apples regexp可以设置一些开关(switchs〕,来控制匹配结果: 开关 -nocase -indices 意义 匹配时不考虑大小写 改变各个变量的值,这使各个变量的值变成了对应的匹配子串在整个字符串中所处位置的索引。例如: % regexp -indices { ([0-9]+) *([a-z]+)} \total num word 1 % puts \ 9 20 ,10 12,15 20 正好子串“ 100 apples”的序号是9-20,\的序号是10-12,\的序号是15-20 -about -expanded -line 返回正则表达式本身的信息,而不是对缓冲区的解析。返回的是一个list,第一个元素是子表达式的个数,第二个元素开始存放子表达式的信息 启用扩展的规则,将空格和注释忽略掉,相当于使用内嵌语法(?x) 启用行敏感匹配。正常情况下^和$只能匹配缓冲区起始和末尾,对于缓冲区内部新的行是不能匹配的,通过这个开关可以使缓冲区内部新的行也可以被匹配。它相当于同时使用-linestop和-lineanchor 开关,或者使用内嵌语法(?n) -linestop -lineanchor -all -inline 启动行结束敏感开关。使^可以匹配缓冲区内部的新行。相当于内嵌语法(?p) 改变^和$的匹配行为,使可以匹配缓冲区内部的新行。相当于内嵌语法(?w) 进最大可能的匹配 Causes the command to return, as a list, the data that would otherwise be placed in match variables. When using -inline, match variables may not be specified. If used with -all, the list will be concatenated at each iteration, such that a flat list is always returned. For each match iteration, the command will append the overall match data, plus one element for each subexpression in the regular expression. Examples are: regexp -inline -- {\\w(\\w)} \2 2

=> {in n} regexp -all -inline -- {\\w(\\w)} \=> {in n li i ne e} -start index 强制从偏移为index开始的位置进行匹配。使用这个开关之后,^将不能匹配行起始位置,\\A将匹配字符串的index偏移位置。如果使用了-indices开关,则indices表示绝对位置,index表示输入字符的相对位置。 -- 表示这后面再没有开关(switchs〕了,即使后面有以'-'开头的参数也被当作正规表达式的一部分。 【TCL正则表达式规则详细说明】 ◆DESCRIPTION(描述) A regular expression describes strings of characters. It's a pattern that matches certain strings and doesn't match others. ◆DIFFERENT FLAVORS OF REs(和标准正则表达式的区别) Regular expressions, as defined by POSIX, come in two flavors: extended REs and basic REs. EREs are roughly those of the traditional egrep, while BREs are roughly those of the traditional ed. This implementation adds a third flavor, advanced REs, basically EREs with some significant extensions. This manual page primarily describes AREs. BREs mostly exist for backward compatibility in some old programs; they will be discussed at the end. POSIX EREs are almost an exact subset of AREs. Features of AREs that are not present in EREs will be indicated. ◆REGULAR EXPRESSION SYNTAX(语法) Tcl regular expressions are implemented using the package written by Henry Spencer, based on the 1003.2 spec and some (not quite all) of the Perl5 extensions (thanks, Henry!). Much of the description of regular expressions below is copied verbatim from his manual entry. An ARE is one or more branches, separated by `|', matching anything that matches any of the branches. A branch is zero or more constraints or quantified atoms, concatenated. It matches a match for the first, followed by a match for the second, etc; an empty branch matches the empty string. A quantified atom is an atom possibly followed by a single quantifier. Without a

23

quantifier, it matches a match for the atom. The quantifiers, and what a so-quantified atom matches, are: 字符 * + ? {m} {m,} {m,n} *? +? 意义 a sequence of 0 or more matches of the atom a sequence of 1 or more matches of the atom a sequence of 0 or 1 matches of the atom a sequence of exactly m matches of the atom a sequence of m or more matches of the atom a sequence of m through n (inclusive) matches of the atom; m may not exceed n ?? non-greedy quantifiers, which match the same possibilities, but matches (see MATCHING) {m}? {m,}? prefer the smallest number rather than the largest number of {m,n}? The forms using { and } are known as bounds. The numbers m and n are unsigned decimal integers with permissible values from 0 to 255 inclusive. An atom is one of: 字符 (re) (?:re) () (?:) [chars] . \\k \\c { 意义 (where re is any regular expression) matches a match for re, with the match noted for possible reporting as previous, but does no reporting matches an empty string, noted for possible reporting matches an empty string, without reporting a bracket expression, matching any one of the chars (see BRACKET EXPRESSIONS for more detail) matches any single character where k is a non-alphanumeric character) matches that character taken as an ordinary character, e.g. \\\\ matches a backslash character where c is alphanumeric (possibly followed by other characters), an escape (AREs only), see ESCAPES below when followed by a character other than a digit, matches the left-brace character `{'; when followed by a digit, it is the beginning of a bound (see above) x where x is a single character with no other significance, matches that character. A constraint matches an empty string when specific conditions are met. A constraint may not be followed by a quantifier. The simple constraints are as follows; some more constraints are described later, under ESCAPES. 2 4 字符 ^ $ (?=re) (?!re) 意义 matches at the beginning of a line matches at the end of a line positive lookahead (AREs only), matches at any point where a substring matching re begins negative lookahead (AREs only), matches at any point where no substring matching re begins The lookahead constraints may not contain back references (see later), and all parentheses within them are considered non-capturing. An RE may not end with `\\'. ◆BRACKET EXPRESSIONS(预定义表达式) A bracket expression is a list of characters enclosed in `[]'. It normally matches any single character from the list (but see below). If the list begins with `^', it matches any single character (but see below) not from the rest of the list. If two characters in the list are separated by `-', this is shorthand for the full range of characters between those two (inclusive) in the collating sequence, e.g. [0-9] in ASCII matches any decimal digit. Two ranges may not share an endpoint, so e.g. a-c-e is illegal. Ranges are very collating-sequence-dependent, and portable programs should avoid relying on them. To include a literal ] or - in the list, the simplest method is to enclose it in [. and .] to make it a collating element (see below). Alternatively, make it the first character (following a possible `^'), or (AREs only) precede it with `\\'. Alternatively, for `-', make it the last character, or the second endpoint of a range. To use a literal - as the first endpoint of a range, make it a collating element or (AREs only) precede it with `\\'. With the exception of these, some combinations using [ (see next paragraphs), and escapes, all other special characters lose their special significance within a bracket expression. Within a bracket expression, a collating element (a character, a multi-character sequence that collates as if it were a single character, or a collating-sequence name for either) enclosed in [. and .] stands for the sequence of characters of that collating element. The sequence is a single element of the bracket expression's list. A bracket expression in a locale that has multi-character collating elements can thus match more than one character. So (insidiously), a bracket expression that starts with ^ can match multi-character collating elements even if none of them appear in the bracket expression! (Note: Tcl currently has no multi-character collating elements. This information is only for illustration.) For example, assume the collating sequence includes a ch multi-character collating element. Then the RE [[.ch.]]*c (zero or more ch's followed by c) matches the first

25


Tcl脚本语言教程 - 图文(5).doc 将本文的Word文档下载到电脑 下载失败或者文档不完整,请联系客服人员解决!

下一篇:仪容礼仪规范

相关阅读
本类排行
× 注册会员免费下载(下载后可以自由复制和排版)

马上注册会员

注:下载文档有可能“只有目录或者内容不全”等情况,请下载之前注意辨别,如果您已付费且无法下载或内容有问题,请联系我们协助你处理。
微信: QQ: