python3编码类问题

黑夜里的黑喵 · 发表于 2018-1-6 13:16:17

我之前一直用2.7写，昨天晚上我想用3.6写一个爬取捧腹网的爬虫，有个问题搞了一晚上都没有解决，特此求助。
爬虫没有写完，我是想把问题一个一个解决。一下为代码：

# -*- coding:utf-8 -*-
import urllib.request,re
#获取源码
def page(pg):
url = 'https://www.pengfu.com/index_%s.html'%pg
html = urllib.request.urlopen(url).read()#读取所有源代码
return html
#title
def title(html):
html = page(1)
html = html.decode('utf-8')#python3.x
reg = re.compile(r'<h1 class="dp-b"><a href=".*?" target="_blank">(.*?)</a>')#正则 .*?代表所有字符
item = re.findall(reg,html)#匹配
return item
#picture
def content(html):
reg = r'<img src="(.*?)" width'
item = re.findall(reg,html)
item = item.decode('utf-8')
return item
#download
def download(url,name):
path = 'H:\python\image\%s.jpg'%name.decode('utf-8').encode('gbk')
urllib.request.urlretrieve(url,path)
for i in range(1,6):
html = page(i)
html = html.decode('utf-8')
title_list = title(html)#图片名称
title_list = title_list.decode('utf-8')
content_list = content(html)
content_list = content_list.decode('utf-8')
for i,z in zip(title_list,content_list).itervalues():
download(z,i)
print(i,z)
b = title()

复制代码

以下为报错：
E:\python3.6\python.exe C:/Users/Administrator/PycharmProjects/spider/pengfu/__init__.py
Traceback (most recent call last):
File "C:/Users/Administrator/PycharmProjects/spider/pengfu/__init__.py", line 34, in <module>
title_list = title_list.decode('utf-8')
AttributeError: 'list' object has no attribute 'decode'

Process finished with exit code 1
贴一张图。咦，图怎么上传不了？

黑夜里的黑喵 · 发表于 2018-1-7 23:17:44

在审核的时候，我已经吧问题解决了，代码贴出来给大家看看。

# -*- coding:utf-8 -*-
import urllib.request,re
#获取源码
def page(pg):
url = 'https://www.pengfu.com/index_%s.html'%pg
html = urllib.request.urlopen(url).read()#读取所有源代码
return html
#title
def title(html):
html = page(1)
html = html.decode('utf-8')#python3.x
reg = re.compile(r'<h1 class="dp-b"><a href=".*?" target="_blank">(.*?)</a>')#正则 .*?代表所有字符
item = re.findall(reg,html)#匹配
return item
#picture
def content(html):
reg = r'<img src="(.*?)" width'
item = re.findall(reg,html)
item = item.decode('utf-8')
return item
#download
def download(url,name):
path = 'H:\学习\Python\腾讯课堂学习python\image\%s.jpg'%name.decode('utf-8').encode('gbk')
urllib.request.urlretrieve(url,path)
for i in range(1,6):
html = page(i)
html = html.decode('utf-8')
title_list = title(html)#图片名称
title_list = title_list.decode('utf-8')
content_list = content(html)
content_list = content_list.decode('utf-8')
for i,z in zip(title_list,content_list).itervalues():
download(z,i)
print(i,z)
b = title()

复制代码

黑夜里的黑喵 · 发表于 2018-1-7 23:19:28

# -*- coding:utf-8 -*-
import urllib.request,re
#获取源码
def page(pg):
url = 'https://www.pengfu.com/index_%s.html'%pg
html = urllib.request.urlopen(url).read()#读取所有源代码
return html
#title
def title(html):
html = page(1)
html = html.decode('utf-8')#python3.x
reg = re.compile(r'<h1 class="dp-b"><a href=".*?" target="_blank">(.*?)</a>')#正则 .*?代表所有字符
item = re.findall(reg,html)#匹配
return item
#picture
def content(html):
reg = r'<img src="(.*?)" width'
item = re.findall(reg,html)
item = item.decode('utf-8')
return item
#download
def download(url,name):
path = 'H:\学习\Python\腾讯课堂学习python\image\%s.jpg'%name.decode('utf-8').encode('gbk')
urllib.request.urlretrieve(url,path)
for i in range(1,6):
html = page(i)
html = html.decode('utf-8')
title_list = title(html)#图片名称
title_list = title_list.decode('utf-8')
content_list = content(html)
content_list = content_list.decode('utf-8')
for i,z in zip(title_list,content_list).itervalues():
download(z,i)
print(i,z)
b = title()

复制代码

		自动登录	找回密码
密码			立即注册

[求助] python3编码类问题