|
刚学Python,练习从大众点评网站上爬取美食信息做数据分析,用了requests、BeautifulSoup库,抓取网页上美食店的子链接时,发现抓取到很多不是自己想要的乱码,请大家帮助一下。代码写错了吗还是大众点评有一些反爬措施?我想爬取每家美食店的名称、几颗星、类别、人均消费、口味等数据,请问应该如何做?谢谢
===============================
代码如下:
#! python3
# food.py - 从大众点评网上抓取数据对南京美食进行分析
import requests
from bs4 import BeautifulSoup
url = 'https://www.dianping.com/search/category/5/0'
res = requests.get(url)
res.encoding = 'utf-8'
soup = BeautifulSoup(res.text,'html.parser')
countPerPage = len(soup.select('.txt'))
for i in range(countPerPage):
subUrl = soup.select('.txt .tit a')['href']
print(subUrl)
s = input()
====================================
结果如下:
/shop/58716774
http://t.dianping.com/deal/24823804
/shop/58716774#promo=10518642
/shop/58716774#waimai
/search/branch/5/0_58716774/g0
/business
/shop/32881142
http://t.dianping.com/deal/19774623
/business
/shop/66064359
http://t.dianping.com/deal/20448511
/shop/66064359#waimai
/business
/shop/22015043
http://t.dianping.com/deal/25121049
|
|