|
import requests
from bs4 import BeautifulSoup
url='https://baike.baidu.com/item/%E8%8D%B7%E5%A1%98%E6%9C%88%E8%89%B2/9765753?fr=aladdin'
headers={'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.88 Safari/537.36'}
html=requests.get(url,headers=headers)
html.encoding='utf-8'
sp=BeautifulSoup(html.text,'html.parser')
print(sp)
data1=sp.select("title")
data1.encoding='utf-8'
print(data1)
typelist=data1.find_all("div", {"class":"para"}),select("a"{href="/item/%E6%9C%B1%E8%87%AA%E6%B8%85/106017"})
print(typelist)
其余的每一步都正确,但是就是偏偏到了抓取标签的这一步错误,我抓的标签没毛病啊 |
|