爬蟲scrapy redirect 302問題

billbird · 发表于 2016-11-18 20:19:53

我是個python新手，最近在學習寫爬蟲，但遇到了障礙，參考了網路上的文章，302重定向的問題一直無法解決。

我想從一財經網站抓取新聞，目前已能把每條新聞的連結取下來，但要透過scrapy來抓取新聞內文時，就卡了。
加入handle_httpstatus_list = [301, 302]這行也行不通，附上程式碼，還請各位幫忙。

import scrapy
from bs4 import BeautifulSoup

class investorcrawler(scrapy.Spider):
name = 'investor'
start_urls=['http://ww2.money-link.com.tw/Product/Investor_Page/Investor_News.aspx']
handle_httpstatus_list = [301, 302]
def parse(self, response):
         domain = 'http://ww2.money-link.com.tw/Product/Investor_Page'
         res = BeautifulSoup(response.body,"html.parser")
         for news in res.findAll('a',href=True,limit=10):
            #print (domain + news.get('href'))
               yield scrapy.Request(domain + news.get('href'), callback=self.parse_detail)

def parse_detail(self,response):
      res = BeautifulSoup(response.body,"html.parser")
      print (res.select('#newsContent'))