|
我之前一直用2.7写,昨天晚上我想用3.6写一个爬取捧腹网的爬虫,有个问题搞了一晚上都没有解决,特此求助。
爬虫没有写完,我是想把问题一个一个解决。一下为代码:
- # -*- coding:utf-8 -*-
- import urllib.request,re
- #获取源码
- def page(pg):
- url = 'https://www.pengfu.com/index_%s.html'%pg
- html = urllib.request.urlopen(url).read()#读取所有源代码
- return html
- #title
- def title(html):
- html = page(1)
- html = html.decode('utf-8')#python3.x
- reg = re.compile(r'<h1 class="dp-b"><a href=".*?" target="_blank">(.*?)</a>')#正则 .*?代表所有字符
- item = re.findall(reg,html)#匹配
- return item
- #picture
- def content(html):
- reg = r'<img src="(.*?)" width'
- item = re.findall(reg,html)
- item = item.decode('utf-8')
- return item
- #download
- def download(url,name):
- path = 'H:\python\image\%s.jpg'%name.decode('utf-8').encode('gbk')
- urllib.request.urlretrieve(url,path)
- for i in range(1,6):
- html = page(i)
- html = html.decode('utf-8')
- title_list = title(html)#图片名称
- title_list = title_list.decode('utf-8')
- content_list = content(html)
- content_list = content_list.decode('utf-8')
- for i,z in zip(title_list,content_list).itervalues():
- download(z,i)
- print(i,z)
- b = title()
复制代码 以下为报错:
E:\python3.6\python.exe C:/Users/Administrator/PycharmProjects/spider/pengfu/__init__.py
Traceback (most recent call last):
File "C:/Users/Administrator/PycharmProjects/spider/pengfu/__init__.py", line 34, in <module>
title_list = title_list.decode('utf-8')
AttributeError: 'list' object has no attribute 'decode'
Process finished with exit code 1
贴一张图。咦,图怎么上传不了?
|
|