找回密码
 立即注册

扫一扫,访问微社区

QQ登录

只需一步,快速开始

查看: 108|回复: 2

[求助] 网页爬取列表总是串行,并且出现怪异现象,请指导

1

主题

1

帖子

1

积分

贫民

积分
1
zesi111 发表于 2020-10-10 19:26:43 | 显示全部楼层 |阅读模式
网站截图如下我的程序如下:
Reporting DateHedge FundShares HeldMarket Value% of PortfolioQuarterly Change in SharesOwnership in Company
10/9/2020Envestnet Asset Management Inc.2,174$0.22M0.0%N/A0.003%
import requests
from bs4 import BeautifulSoup
import xlwt

# 请求headers 模拟谷歌浏览器访问
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36'
}


def get_data():
    resp**e = requests.get('https://www.marketbeat.com/stocks/NYSE/BILL/institutional-ownership/', headers=headers)
    bs = BeautifulSoup(resp**e.text, 'lxml')

    # 标题处理
title = bs.find_all('th')
    data_list_title = []  # 定义一个空列表
for data in title:
        data_list_title.append(data.text.strip())  # 获取标签的内容去掉两边空格并添加到列表里
# 内容处理
content = bs.find_all('td')
    data_list_content = []  # 定义一个空列表
for data in content:
        data_list_content.append(data.text.strip())  # 获取标签的内容去掉两边空格并添加到列表里
# 语句featList = [example for example in dataSet]作用为: 将dataSet中的数据按行依次放入example中,然后取得example中的example元素,放入列表featList
new_list = [data_list_content[i:i + 16] for i in range(0, len(data_list_content), 16)]

    # 存入excel表格
book = xlwt.Workbook()
    sheet1 = book.add_sheet('sheet1', cell_overwrite_ok=True)

    # 标题存入
heads = data_list_title[:]  # data_list_title第一位到最后一位赋值给heads
ii = 0
    for head in heads:
        sheet1.write(0, ii, head)
        ii += 1

    # 内容录入
i = 1
    for list in new_list:
        j = 0
        for data in list:
            sheet1.write(i, j, data)
            j += 1
        i += 1
    # 文件保存
book.save('./data.xls')


print("全部完成")

# 调用
get_data()


excel表结果如下串行并且左右重复,请大家指导一下,感谢!!!
Reporting  DateHedge FundShares HeldMarket Value% of PortfolioQuarterly Change in SharesOwnership  in Company
10/9/2020Envestnet Asset Management Inc.2,174$0.22M0.0%N/A0.003%10/6/2020Avitas Wealth Management LLC5,410$0.54M0.2%+57.2%0.007%
9/28/2020Manchester Capital Management LLC1,500$0.14M0.0%+50.0%0.002%9/22/2020Atria Investments LLC31,463$2.84M0.1%N/A0.039%
9/15/2020Two Sigma Advisers LP5,800$0.52M0.0%N/A0.007%9/15/2020Schonfeld Strategic Advisors LLC62,798$5.67M0.1%N/A0.078%
9/4/2020Principal Financial Group Inc.2,390$0.22M0.0%N/A0.003%8/27/2020Neuberger Berman Group LLC34,087$3.08M0.0%+88.1%0.047%
8/25/2020Nuveen Asset Management LLC96,953$8.75M0.0%+192.5%0.134%8/20/2020Charles Schwab Investment Management Inc.95,013$8.57M0.0%N/A0.131%
8/18/2020Blackstone Group Inc172,686$15.58M0.1%N/A0.238%8/17/2020Engineers Gate Manager LP5,927$0.54M0.0%N/A0.008%
8/17/2020California State Teachers Retirement System27,675$2.50M0.0%+66.5%0.038%8/17/2020Townsquare Capital LLC40,538$3.37M0.2%N/A0.056%



回复

使用道具 举报

1

主题

2

帖子

2

积分

贫民

积分
2
王自信 发表于 2020-10-13 09:59:59 | 显示全部楼层
16     8?
1602554314(1).jpg
回复 支持 反对

使用道具 举报

0

主题

724

帖子

724

积分

圣骑士

积分
724
sheeboard 发表于 2020-10-13 10:41:08 | 显示全部楼层
本帖最后由 sheeboard 于 2020-10-13 11:31 编辑

数据获取最好是按table->tr->td顺序一步步来。

result.xlsx

11.12 KB, 下载次数: 1

回复 支持 反对

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

快速回复 返回顶部 返回列表