|
import requests
from bs4 import BeautifulSoup
url='https://baike.baidu.com/item/%E8%8D%B7%E5%A1%98%E6%9C%88%E8%89%B2/9765753?fr=aladdin'
headers={'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.88 Safari/537.36'}
html=requests.get(url,headers=headers)
html.encoding='utf-8'
sp=BeautifulSoup(html.text,'html.parser')
print(sp)
data1=sp.select("title")
data1.encoding='utf-8'
print(data1)
typelist=data1.find_all("div", {"class":"para"}),select("a"{href="/item/%E6%9C%B1%E8%87%AA%E6%B8%85/106017"})
print(typelist)
我就差抓取标签这一步了,我实在不知道该怎么办了,各位大佬帮帮忙。
|
|