正则表达式报错：look-behind requires fixed-width pattern

SHocker77 · 发表于 2016-9-23 10:37:36

代码：

# -*- coding:utf-8 -*-
import re
import sys
reload(sys)
sys.setdefaultencoding("utf-8")
#第一个编号后面的是中文冒号，第二个编号后面的是英文冒号
text = "JGood iss a 编号：123456 boy, he is 编号:123456, clever, and so on..."
reg = re.compile(r'(?<=编号(:|：))\d+')
match = reg.search(text)
if match:
print match.group(0)
else:
print 'not match'

复制代码

需求是不管遇到中文或者英文冒号，都能够打印输出后面的数字

报错信息：

C:\Users\pc\Desktop>python test.py
Traceback (most recent call last):
File "test.py", line 8, in <module>
reg = re.compile(r'(?<=缂栧彿(:|锛?)\s*\S+')
File "D:\Python 2.7\lib\re.py", line 190, in compile
return _compile(pattern, flags)
File "D:\Python 2.7\lib\re.py", line 245, in _compile
raise error, v # invalid expression
sre_constants.error: look-behind requires fixed-width pattern

复制代码

感觉应该是中文编码的问题，但是加了

import sys
reload(sys)
sys.setdefaultencoding("utf-8")

复制代码

还是报同样的错误，搜索过这个报错信息，没找到正确的解决方法，请帮忙看一下

python奋青 · 发表于 2016-9-23 22:03:50

把(:|：)换成[:|：]

whydo1 · 发表于 2016-9-24 21:12:55

我用的是python3.4.4, 以下代码正常运行, 2.x下你可以做相应更改

import re
import sys
#第一个编号后面的是中文冒号，第二个编号后面的是英文冒号
text = "JGood iss a 编号：123456 boy, he is 编号:654321, clever, and so on..."
reg = re.compile(r'(?<=编号[:：])\d+')
match = reg.findall(text)
if match:
print(match)
else:
print('not match')

复制代码

SHocker77 · 发表于 2016-9-28 16:56:01

whydo1 发表于 2016-9-24 21:12
我用的是python3.4.4, 以下代码正常运行, 2.x下你可以做相应更改

我是2.7，用你的代码，开头加上 # coding=utf-8，打印结果是：['654321']，不加的话是： File "test.py", line 6
SyntaxError: Non-ASCII character '\xe7' in file test.py on line 6, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details

加了之后打印出的只有英文冒号后面的数字，中文冒号还是没识别出来的感觉

SHocker77 · 发表于 2016-9-28 16:57:47

python奋青发表于 2016-9-23 22:03
把(:|：)换成[:|：]

谢谢回复，之前用过这种方法，我的是2.7，不知道是不是这个原因，中文的冒号单独写：？>=编号：,就可以得到中文冒号后面的数字，但是二合一的写法就跪了

whydo1 · 发表于 2016-9-28 22:03:11

SHocker77 发表于 2016-9-28 16:56
我是2.7，用你的代码，开头加上 # coding=utf-8，打印结果是：['654321']，不加的话是： File "test.py" ...

text = "JGood iss a 编号：123456 boy, he is 编号:654321, clever, and so on..."
改为
text = u"JGood iss a 编号：123456 boy, he is 编号:654321, clever, and so on..."

在2.7下面,默认是bytes编码, 加上u,表示是unicode字符串