|
本帖最后由 novboy 于 2017-5-11 15:47 编辑
大家好,最近在搞SEO的分析优化,需要用到python的脚本了,也有很长一段时间没接触python了,很多基础的知识都需要重新温习了。
废话不多说,先上脚本,curl.py如下:
- #! /usr/bin/env python
- # -*- coding: utf-8 -*-
- from pycurl import *
- import StringIO, time, random
- def curl(url, retry=False, delay=1, **kwargs):
- useragent_list = ['Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.6','Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)','Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.04506.30)','Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322)','Opera/9.20 (Windows NT 6.0; U; en)','Mozilla/4.0 (compatible; MSIE 5.0; Windows NT 5.1; .NET CLR 1.1.4322)','Opera/9.00 (Windows NT 5.1; U; en)','Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; en) Opera 8.50','Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; en) Opera 8.0','Mozilla/4.0 (compatible; MSIE 6.0; MSIE 5.5; Windows NT 5.1) Opera 7.02 [en]','Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.5) Gecko/20060127 Netscape/8.1',]
- size = len(useragent_list)
- useragent = useragent_list[random.randint(0, size-1)]
- s = StringIO.StringIO()
- c = Curl()
- c.setopt(NOSIGNAL, True)
- c.setopt(FOLLOWLOCATION, True)
- c.setopt(MAXREDIRS, 5)
- c.setopt(TIMEOUT, 120)
- for key in kwargs:
- c.setopt(locals()[key], kwargs[key])
- c.setopt(URL, url)
- c.setopt(WRITEFUNCTION, s.write)
- if 'USERAGENT' not in kwargs:
- c.setopt(USERAGENT, useragent)
- if 'REFERER' not in kwargs:
- c.setopt(REFERER, url)
- while 1:
- try:
- c.perform()
- break
- except:
- if retry:
- time.sleep(delay)
- else:
- return False
- return s.getvalue()
复制代码 说明:
pycurl 模块我已经通过pip安装了,然后我用一个测试脚本test.py,检查一下上面这个程序的通过性如何,但发现一直在运动,没任何的输出。
test.py代码如下:
- #! /usr/bin/env python
- # -*- coding: utf-8 -*-
- import curl #读取上面的curl.py模块
- print curl.curl('www.baidu.com')
复制代码 输出界面:
Crtl+D终止后的提示:
现在困惑不知哪儿出错了。
|
|