|
希望能够爬取python百度百科的词条内容,但在最后写入txt文档中卡住了,python小白学习编程难啊,希望大神解答这个小小的问题!下面的是代码:
import urllib2
import re
from bs4 import BeautifulSoup
url='https://baike.baidu.com/item/Python/407313?fr=aladdin'
f=open('python.text','w')
webpage=urllib2.urlopen(url).read()
soup=BeautifulSoup(webpage,'html.parser',from_encoding='utf-8')
ds=soup.find_all('div')
for content in ds:
if content.get('class')==['para']:
f.write(content.get_text())
Spyder显示的错误原因有:runfile('C:/Users/123/Desktop/爬虫练习/1.14.py', wdir='C:/Users/123/Desktop/爬虫练习')
Traceback (most recent call last):
File "<ipython-input-10-905b2147af03>", line 1, in <module>
runfile('C:/Users/123/Desktop/爬虫练习/1.14.py', wdir='C:/Users/123/Desktop/爬虫练习')
File "C:\Users\123\Anaconda\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 585, in runfile
execfile(filename, namespace)
File "C:/Users/123/Desktop/爬虫练习/1.14.py", line 25, in <module>
f.write(content.get_text())
UnicodeEncodeError: 'gbk' codec can't encode character u'\xa0' in position 9: illegal multibyte sequence
|
|