找回密码
 立即注册

扫一扫,访问微社区

QQ登录

只需一步,快速开始

查看: 4071|回复: 2

[已解决] 关于爬虫,requests和beautifulsoup的问题

1

主题

1

帖子

1

积分

贫民

积分
1
求知路上 发表于 2017-4-26 00:41:36 | 显示全部楼层 |阅读模式
新人刚入门爬虫,初步写一段代码。py3.5.1版本,运行疯狂报错,我很绝望啊,请问各种大哥大姐,这是怎么回事import requests
from bs4 import BeautifulSoup

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36',
    'accept-encoding': 'gzip, deflate, sdch, br',
    'accept-language': 'zh-CN,zh;q=0.8'
}
url = 'http://http://bbs.tianya.cn/'
r = requests.get(url, headers=headers).content
soup=BeautifulSoup.findAll('div',{"class":"title"})



报错信息如下
Traceback (most recent call last):
  File "C:\Users\cong\AppData\Local\Programs\Python\Python35\lib\site-packages\requests\packages\urllib3\connection.py", line 141, in _new_conn
    (self.host, self.port), self.timeout, **extra_kw)
  File "C:\Users\cong\AppData\Local\Programs\Python\Python35\lib\site-packages\requests\packages\urllib3\util\connection.py", line 60, in create_connection
    for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
  File "C:\Users\cong\AppData\Local\Programs\Python\Python35\lib\socket.py", line 732, in getaddrinfo
    for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno 11004] getaddrinfo failed

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\cong\AppData\Local\Programs\Python\Python35\lib\site-packages\requests\packages\urllib3\connectionpool.py", line 600, in urlopen
    chunked=chunked)
  File "C:\Users\cong\AppData\Local\Programs\Python\Python35\lib\site-packages\requests\packages\urllib3\connectionpool.py", line 356, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "C:\Users\cong\AppData\Local\Programs\Python\Python35\lib\http\client.py", line 1083, in request
    self._send_request(method, url, body, headers)
  File "C:\Users\cong\AppData\Local\Programs\Python\Python35\lib\http\client.py", line 1128, in _send_request
    self.endheaders(body)
  File "C:\Users\cong\AppData\Local\Programs\Python\Python35\lib\http\client.py", line 1079, in endheaders
    self._send_output(message_body)
  File "C:\Users\cong\AppData\Local\Programs\Python\Python35\lib\http\client.py", line 911, in _send_output
    self.send(msg)
  File "C:\Users\cong\AppData\Local\Programs\Python\Python35\lib\http\client.py", line 854, in send
    self.connect()
  File "C:\Users\cong\AppData\Local\Programs\Python\Python35\lib\site-packages\requests\packages\urllib3\connection.py", line 166, in connect
    conn = self._new_conn()
  File "C:\Users\cong\AppData\Local\Programs\Python\Python35\lib\site-packages\requests\packages\urllib3\connection.py", line 150, in _new_conn
    self, "Failed to establish a new connection: %s" % e)
requests.packages.urllib3.exceptions.NewConnectionError: <requests.packages.urllib3.connection.HTTPConnection object at 0x0000000002FFBB38>: Failed to establish a new connection: [Errno 11004] getaddrinfo failed

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\cong\AppData\Local\Programs\Python\Python35\lib\site-packages\requests\adapters.py", line 423, in send
    timeout=timeout
  File "C:\Users\cong\AppData\Local\Programs\Python\Python35\lib\site-packages\requests\packages\urllib3\connectionpool.py", line 649, in urlopen
    _stacktrace=sys.exc_info()[2])
  File "C:\Users\cong\AppData\Local\Programs\Python\Python35\lib\site-packages\requests\packages\urllib3\util\retry.py", line 376, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
requests.packages.urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='http', port=80): Max retries exceeded with url: //bbs.tianya.cn/ (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x0000000002FFBB38>: Failed to establish a new connection: [Errno 11004] getaddrinfo failed',))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:/Users/cong/11.py", line 11, in <module>
    r =requests.get(rul, headers=headers).content
  File "C:\Users\cong\AppData\Local\Programs\Python\Python35\lib\site-packages\requests\api.py", line 70, in get
    return request('get', url, params=params, **kwargs)
  File "C:\Users\cong\AppData\Local\Programs\Python\Python35\lib\site-packages\requests\api.py", line 56, in request
    return session.request(method=method, url=url, **kwargs)
  File "C:\Users\cong\AppData\Local\Programs\Python\Python35\lib\site-packages\requests\sessions.py", line 488, in request
    resp = self.send(prep, **send_kwargs)
  File "C:\Users\cong\AppData\Local\Programs\Python\Python35\lib\site-packages\requests\sessions.py", line 609, in send
    r = adapter.send(request, **kwargs)
  File "C:\Users\cong\AppData\Local\Programs\Python\Python35\lib\site-packages\requests\adapters.py", line 487, in send
    raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='http', port=80): Max retries exceeded with url: //bbs.tianya.cn/ (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x0000000002FFBB38>: Failed to establish a new connection: [Errno 11004] getaddrinfo failed',))




好长,这到底是怎么回事。新人求问,有什么不对的请见谅
回复

使用道具 举报

1

主题

8

帖子

9

积分

贫民

积分
9
Min-Coco 发表于 2017-4-27 09:03:10 | 显示全部楼层
首先,你的网址错啦!你大爷的两个‘http’!!!!

import urllib.request
from bs4 import BeautifulSoup


headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36',
    'accept-encoding': 'gzip, deflate, sdch, br',
    'accept-language': 'zh-CN,zh;q=0.8'
}
url = 'http://bbs.tianya.cn/'
r = requests.get(url, headers=headers).content
bs = BeautifulSoup(r)#增加这一行
soup=bs.findAll('div',{"class":"title"})
print(soup)


我也是个菜鸟,我也就看看如果是我,我会怎么让它输出。
回复 支持 反对

使用道具 举报

0

主题

15

帖子

15

积分

贫民

积分
15
newlive 发表于 2017-4-27 09:16:34 | 显示全部楼层
soup = BeautifulSoup(r, 'xxxx')
不多说,楼主连最基本的文档都没看,我说的多了也没意义,说的深了,你看不懂
回复 支持 反对

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

快速回复 返回顶部 返回列表