urllib urllib2 decode的差别在那里呢？

gary721400 · 发表于 2013-7-3 18:10:47

temp_url = "http://www.sina.com.cn"  #已经知道新浪的主页编码是：GB2312

使用urllib库，正常；

def gethtml(url):
   import urllib
page = urllib.urlopen(url)
html = page.read().decode('GB2312')
print html       #打印正常；linux windows我都实验了

使用urllib2 ，错误；

def gethtml(url):
      import urllib2
temp_request  = urllib2.Request(url,headers = headers)
html = urllib2.urlopen(temp_request).read().decode('gbk')
print html    #linux下边打印是乱码；windows下直接报错！

差别在那里？那位大神说说看！

icymirror · 发表于 2013-10-18 10:54:48

在Python在线手册中：
Urllib:
This module provides a high-level interface for fetching data across the World Wide Web. In particular, the urlopen() function is similar to the built-in function open(), but accepts Universal Resource Locators (URLs) instead of filenames. Some restrictions apply — it can only open URLs for reading, and no seek operations are available.
些提供一个高层接口，用来在互联网上抓取数据。特别的一点是，urlopen()函数和内建函数open()非常相似的，但是接受的参数是URLs，而不像open一样是文件名。另外，还有些其它的限制，urlopen只能用来打开URLs去读取数据，但是不提供搜索操作。
UrlLib2:
The urllib2 module defines functions and classes which help in opening URLs (mostly HTTP) in a complex world — basic and digest authentication, redirections, cookies and more.
Urllib2模块定义了一套复杂的方法和类来帮助打开URLs，基本操作、数字认证、重定向，cookie以及其他

简单说：urllib就是一个抓取数据的最基本的接口，只抓数据，但是urllib2更接近于对HTTP协议操作的封装，可以应用的环境也更多。

		自动登录	找回密码
密码			立即注册

[讨论] urllib urllib2 decode的差别在那里呢？