Java 正则表达式的捕获组(capture group)

捕获组分为:

  • 普通捕获组(Expression)
  • 命名捕获组(? Expression)

普通捕获组

从正则表达式左侧开始,每出现一个左括号”(“记做一个分组,分组编号从 1 开始。0 代表整个表达式。

对于时间字符串:2017-04-25,表达式如下

(\\d{4})-((\\d{2})-(\\d{2}))

有 4 个左括号,所以有 4 个分组:

编号 捕获组 匹配
0 (\d{4})-((\d{2})-(\d{2})) 2017-04-25
1 (\d{4}) 2017
2 ((\d{2})-(\d{2})) 04-25
3 (\d{2}) 04
4 (\d{2}) 25

public

static

final

String

DATE_STRING
=

2017-04-25

;
public

static

final

String

P_COMM
=

(
\\
d{4})-((
\\
d{2})-(
\\
d{2}))

;
Pattern

pattern
=
Pattern
.
compile
(
P_COMM
)
;
Matcher

matcher
=
pattern
.
matcher
(
DATE_STRING
)
;
matcher
.
find
(
)
;
//
必须要有这句


System
.
out
.
printf
(

\n
matcher.group(0) value:%s

,
matcher
.
group
(
0
)
)
;
System
.
out
.
printf
(

\n
matcher.group(1) value:%s

,
matcher
.
group
(
1
)
)
;
System
.
out
.
printf
(

\n
matcher.group(2) value:%s

,
matcher
.
group
(
2
)
)
;
System
.
out
.
printf
(

\n
matcher.group(3) value:%s

,
matcher
.
group
(
3
)
)
;
System
.
out
.
printf
(

\n
matcher.group(4) value:%s

,
matcher
.
group
(
4
)
)
;

命名捕获组

每个以左括号开始的捕获组,都紧跟着 ?,而后才是正则表达式。

对于时间字符串:2017-04-25,表达式如下:

(?<year>\\d{4})-(?<md>(?<month>\\d{2})-(?<date>\\d{2}))

有 4 个命名的捕获组,分别是:

编号 名称 捕获组 匹配
0 0 (?\d{4})-(?(?\d{2})-(?\d{2})) 2017-04-25
1 year (?\d{4})- 2017
2 md (?(?\d{2})-(?\d{2})) 04-25
3 month (?\d{2}) 04
4 date (?\d{2}) 25

命名的捕获组同样也可以使用编号获取相应值。

public

static

final

String

P_NAMED
=

(?<year>
\\
d{4})-(?<md>(?<month>
\\
d{2})-(?<date>
\\
d{2}))

;
public

static

final

String

DATE_STRING
=

2017-04-25

;
Pattern

pattern
=
Pattern
.
compile
(
P_NAMED
)
;
Matcher

matcher
=
pattern
.
matcher
(
DATE_STRING
)
;
matcher
.
find
(
)
;
System
.
out
.
printf
(

\n
===========使用名称获取=============

)
;
System
.
out
.
printf
(

\n
matcher.group(0) value:%s

,
matcher
.
group
(
0
)
)
;
System
.
out
.
printf
(

\n
matcher.group(‘year’) value:%s

,
matcher
.
group
(

year

)
)
;
System
.
out
.
printf
(

\n
matcher.group(‘md’) value:%s

,
matcher
.
group
(

md

)
)
;
System
.
out
.
printf
(

\n
matcher.group(‘month’) value:%s

,
matcher
.
group
(

month

)
)
;
System
.
out
.
printf
(

\n
matcher.group(‘date’) value:%s

,
matcher
.
group
(

date

)
)
;
matcher
.
reset
(
)
;
System
.
out
.
printf
(

\n
===========使用编号获取=============

)
;
matcher
.
find
(
)
;
System
.
out
.
printf
(

\n
matcher.group(0) value:%s

,
matcher
.
group
(
0
)
)
;
System
.
out
.
printf
(

\n
matcher.group(1) value:%s

,
matcher
.
group
(
1
)
)
;
System
.
out
.
printf
(

\n
matcher.group(2) value:%s

,
matcher
.
group
(
2
)
)
;
System
.
out
.
printf
(

\n
matcher.group(3) value:%s

,
matcher
.
group
(
3
)
)
;
System
.
out
.
printf
(

\n
matcher.group(4) value:%s

,
matcher
.
group
(
4
)
)
;

PS:非捕获组

在左括号后紧跟 ?:,而后再加上正则表达式,构成非捕获组 (?:Expression)

对于时间字符串:2017-04-25,表达式如下:

(?:\\d{4})-((\\d{2})-(\\d{2}))

这个正则表达式虽然有四个左括号,理论上有 4 个捕获组。但是第一组 (?:\d{4}),其实是被忽略的。当使用 matcher.group(4) 时,系统会报错。

编号 捕获组 匹配
0 (\d{4})-((\d{2})-(\d{2})) 2017-04-25
1 ((\d{2})-(\d{2})) 04-25
2 (\d{2}) 04
3 (\d{2}) 25

public

static

final

String

P_UNCAP
=

(?:
\\
d{4})-((
\\
d{2})-(
\\
d{2}))

;
public

static

final

String

DATE_STRING
=

2017-04-25

;
Pattern

pattern
=
Pattern
.
compile
(
P_UNCAP
)
;
Matcher

matcher
=
pattern
.
matcher
(
DATE_STRING
)
;
matcher
.
find
(
)
;
System
.
out
.
printf
(

\n
matcher.group(0) value:%s

,
matcher
.
group
(
0
)
)
;
System
.
out
.
printf
(

\n
matcher.group(1) value:%s

,
matcher
.
group
(
1
)
)
;
System
.
out
.
printf
(

\n
matcher.group(2) value:%s

,
matcher
.
group
(
2
)
)
;
System
.
out
.
printf
(

\n
matcher.group(3) value:%s

,
matcher
.
group
(
3
)
)
;
//
Exception in thread “main” java.lang.IndexOutOfBoundsException: No group 4


System
.
out
.
printf
(

\n
matcher.group(4) value:%s

,
matcher
.
group
(
4
)
)
;

总结

  • 普通捕获组使用方便;
  • 命名捕获组使用清晰;
  • 非捕获组目前在项目中还没有用武之地。

原文地址:https://blog.csdn.net/just4you/article/details/70767928

www.ysidc.top 西数超哥博客,数据库,西数超哥,虚拟主机,域名注册,域名,云服务器,云主机,云建站,ysidc.top

西数超哥学习乐园,西数超哥基础运维经验教程分享的学习乐园,西数超哥博客,运维经验教程交流学习分享的博客

原创文章,作者:zhang sir,如若转载,请注明出处:https://www.ysidc.top/5974.html