Python爬取网页源代码为空，求问原因&解决方案（或方向）

zcm · 发表于 2015-8-11 13:05:49

求：在爬取网页源代码的时候返回空的原因及解决方案（或解决方向）~求大神指点迷津啊！
（PS：在处理这个问题的时候，我曾在IDLE上直接敲这段代码运行，有时候可以返回源代码有时候不可以，另外，有时候我把程序运行了几十遍之后，就能返回源代码，这时候我把url的数字2改为3时（即相当下一页），又不可以了，好诡异~~）

代码如下：
import urllib
import urllib2
import re
url ='http://www.yingjiesheng.com/guangzhou-moreptjob-2.html'
req = urllib2.Request(url)
try:
html = urllib2.urlopen(req).read()
print html
except urllib2.HTTPError, e:
print 'The server couldn\'t fulfill the request.'
print 'Error code: ', e.code
except urllib2.URLError, e:
print 'We failed to reach a server.'
print 'Reason: ', e.reason
else:
print 'No exception was raised.'
代码结果图：

代码结果

关大叔 · 发表于 2015-8-12 10:41:53

import urllib2
url ='http://www.yingjiesheng.com/guangzhou-moreptjob-2.html'
req = urllib2.Request(url)
try:
      response = urllib2.urlopen(req)
      content = response.read()
      print content.decode('gbk')
except urllib2.URLError, e:
      print 'We failed to reach a server.'
      if hasattr(e, "Reason"):
         print 'Reason: ', e.reason
      else:
         print 'No exception was raised.'

转换编码就可以了

zcm · 发表于 2015-8-16 10:38:25

感谢关大叔的帮忙~
最近几天我也一直在找问题，找了个做python开发的朋友问了一下，他说这可能跟网速有关。。。换个网速快一点的wifi，多试几次就好了。

		自动登录	找回密码
密码			立即注册

[求助] Python爬取网页源代码为空，求问原因&解决方案（或方向）