Python正则表达式使用手册

2024-06-04 加入收藏

正则在处理字符串的领域是无可非议的强大！

正因过于强大，导致正则的使用门槛也不低。

我总结一些常见的匹配方式并结合Python代码例子，希望对大家使用/使用正则的时候有所帮助！

字符匹配

字符匹配：匹配指定的字符或字符集合，例如「单个字符」[a-z]、「数字字符」\d等。

import re

# 匹配单个字符
pattern = r"b[aeiou]t"
string = "bat, bet, bit, bot, but"
result = re.findall(pattern, string)
print(result) # ['bat', 'bet', 'bit', 'bot', 'but']

# 匹配数字字符
pattern = r"\d+"
string = "123 456 789"
result = re.findall(pattern, string)
print(result) # ['123', '456', '789']

位置匹配

位置匹配：匹配字符串的位置，例如「行首」、「行尾」、「单词边界」等。

import re

# 匹配行首
pattern = r"^The"
string = "The quick brown fox\nThe lazy dog"
result = re.findall(pattern, string, re.MULTILINE)
print(result) # ['The', 'The']

# 匹配单词边界
pattern = r"\bfox\b"
string = "The quick brown fox\njumps over the lazy dog"
result = re.findall(pattern, string)
print(result) # ['fox']

重复匹配

重复匹配：匹配重复出现的字符或字符集合，例如「重复次数」、「重复范围」等。

import re

# 匹配重复次数
pattern = r"a{3}"
string = "aaa abc aa a"
result = re.findall(pattern, string)
print(result) # ['aaa']

# 匹配重复范围
pattern = r"\d{2,3}"
string = "12 123 1234 12345"
result = re.findall(pattern, string)
print(result) # ['12', '123', '123', '345']

分支匹配

分支匹配：匹配多个可选项，例如「选项1」或「选项2」。

import re

# 匹配多个可选项
pattern = r"cat|dog"
string = "The quick brown fox jumps over the lazy dog"
result = re.findall(pattern, string)
print(result) # ['dog']

# 匹配多个可选项（忽略大小写）
pattern = r"cat|dog"
string = "The quick brown Fox jumps over the lazy Dog"
result = re.findall(pattern, string, re.IGNORECASE)
print(result) # ['Fox', 'Dog']

分组匹配

分组匹配：匹配特定的字符或字符集合，并将其标记为子表达式，例如「提取子字符串」。

import re

# 提取子字符串
pattern = r"(\d{4})-(\d{2})-(\d{2})"
string = "2022-03-02 is a good day"
result = re.findall(pattern, string)
print(result) # [('2022', '03', '02')]

后向引用匹配

后向引用匹配：匹配之前已经匹配的子表达式，例如查找重复单词。

import re

# 提取子字符串
pattern = r"(\d{4})-(\d{2})-(\d{2})"
string = "2022-03-02 is a good day"
result = re.findall(pattern, string)
print(result) # [('2022', '03', '02')]

贪婪匹配和懒惰匹配

贪婪匹配和非贪婪匹配：贪婪匹配是指匹配「尽可能多」的字符，懒惰匹配是指匹配「尽可能少」的字符。

import re

# 贪婪匹配
pattern = r"<.*>"
string = "<a>hello</a><b>world</b>"
result = re.findall(pattern, string)
print(result) # ['<a>hello</a><b>world</b>']

# 非贪婪匹配
pattern = r"<.*?>"
string = "<a>hello</a><b>world</b>"
result = re.findall(pattern, string)
print(result) # ['<a>', '</a>', '<b>', '</b>']

零宽度断言匹配

零宽度断言匹配：零断言可以匹配一个位置，而不是匹配一个字符。

零断言用于在匹配字符串时，指定匹配的位置前或后必须满足某些条件，从而实现更加精确的匹配。

在正则表达式中，有四种常用的零断言：

正向零断言：匹配满足正则表达式的字符后面的位置，但不包括这些字符。

import re

# 正向零断言，匹配hello后面是world的位置
pattern = r"hello(?=world)"
string = "hellopythonhelloworld"
result = re.findall(pattern, string)
print(result) # ['hello']

反向零断言：匹配不满足正则表达式的字符后面的位置，但不包括这些字符。

import re

# 反向零断言，匹配hello后面不是world的位置
pattern = r"hello(?!world)"
string = "hellopythonhelloworld"
result = re.findall(pattern, string)
print(result) # ['hello']

正向零宽度断言：匹配满足正则表达式的字符前面的位置，但不包括这些字符。

import re

# 正向零宽度断言，匹配hello前面是python的位置
pattern = r"(?<=python)hello"
string = "pythonhellopythonworld"
result = re.findall(pattern, string)
print(result) # ['hello']

反向零宽度断言：匹配不满足正则表达式的字符前面的位置，但不包括这些字符。

import re

# 反向零宽度断言，匹配hello前面不是python的位置
pattern = r"(?<!python)hello"
string = "pythonhellopythonworld"
result = re.findall(pattern, string)
print(result) # []

❝
如果你想加速Python学习，获得专业的指导，30天学会一门技能！
欢迎参加麦叔Python实战训练营，「入门营」，「爬虫营」，「办公自动化营」同步开放。
详情点这里：麦叔Python训练营
❞

如果你希望我更新某个特定小知识，欢迎给我留言。

飞酷网络

技术日志

技术日志

Python正则表达式使用手册

字符匹配

位置匹配

重复匹配

分支匹配

分组匹配

后向引用匹配

贪婪匹配和懒惰匹配

零宽度断言匹配

热推

相关

南京网站建设全攻略，轻松打造行业领先的线上平台

南京网站建设新标杆，让您的业务在线飞跃

南京网站建设中常见问题及解决方案

选择南京网站建设公司的五大标准

南京网站建设趋势分析：2024年你不可错过的机会

探索南京网站建设公司的价格与服务解析

南京网站建设：打造您的品牌新形象

南京网站建设：助力企业数字化转型的秘密利

南京网站建设：让您的企业在网络世界中脱颖而出

南京网站建设：打造专属您的企业形象

标签