求助，如何统计word文档高频词汇

小西xiaoxi1 · 发表于 2021-9-15 16:46:00

# -*- coding: utf-8 -*-
import jieba

txt = open("小说.txt","r",encoding='utf-8').read()
words = jieba.lcut(txt)
counts = {}

for word in words:
if len(word) == 1:
      continue
else:
      counts[word]=counts.get(word,0)+1

items = list(counts.items())
items.sort(key=lambda x: x[1],reverse=true)

for i in range(3):
word,count=items[i]
print("[{0:<5}{1:5}".format(word,count))

我想统计文档里面出现的前50个高频词汇，这是我从网上复制的代码，但是会出现以下提示错误

Traceback (most recent call last):
  File "F:/1/个人学习/python/统计频率.py", line 4, in <module>
txt = open("小说.txt","r",encoding='utf-8').read()
  File "D:\lib\codecs.py", line 322, in decode
(result, c**umed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc1 in position 0: invalid start byte

请问改如何修改？感谢大家

		自动登录	找回密码
密码			立即注册

[代码与实例] 求助，如何统计word文档高频词汇