|
# -*- coding: utf-8 -*-
import jieba
txt = open("小说.txt","r",encoding='utf-8').read()
words = jieba.lcut(txt)
counts = {}
for word in words:
if len(word) == 1:
continue
else:
counts[word]=counts.get(word,0)+1
items = list(counts.items())
items.sort(key=lambda x: x[1],reverse=true)
for i in range(3):
word,count=items[i]
print("[{0:<5}{1:5}".format(word,count))
我想统计文档里面出现的前50个高频词汇,这是我从网上复制的代码,但是会出现以下提示错误
Traceback (most recent call last):
File "F:/1/个人学习/python/统计频率.py", line 4, in <module>
txt = open("小说.txt","r",encoding='utf-8').read()
File "D:\lib\codecs.py", line 322, in decode
(result, c**umed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc1 in position 0: invalid start byte
请问改如何修改?感谢大家
|
|