|
问题描述:我想用BeautifulSoup爬取一个在<table>标签下的数据(如下所示),希望从中抽取表格里的所有数据
- <table width="100%" cellpadding="0" cellspacing="0" class="winning-lottery-table">
- <thead>
- <tr>
- <th width="33.3%"><span class="text1">期号</span><span class="text2">开奖号</span><span class="text1">十位</span><span class="text1">个位</span><span class="text1">后三</span></th>
- <th width="33.3%"><span class="text1">期号</span><span class="text2">开奖号</span><span class="text1">十位</span><span class="text1">个位</span><span class="text1">后三</span></th>
- <th><span class="text1">期号</span><span class="text2">开奖号</span><span class="text1">十位</span><span class="text1">个位</span><span class="text1">后三</span></th>
- </tr>
- </thead>
- <tbody id="cqssc_draw_list_tbody"><tr><td><span class="text1">001</span><span id="draw_td_001"><span class="text2 red_big">1 3 2 1 7</span><span class="text1">小单</span><span class="text1"><span class="orange">大</span>单</span><span class="text1">组六</span></span></td><td><span class="text1">041</span><span id="draw_td_041"><span class="text2 red_big">7 6 6 4 7</span><span class="text1">小<span class="orange">双</span></span><span class="text1"><span class="orange">大</span>单</span><span class="text1">组六</span></span></td><td><span class="text1">081</span>7 4 6</span><span class="text1">小<span class="orange">双</span></span><span class="text1"><span class="orange">大</span><span class="orange">双</span></span><span class="text1">组六</span></span></td><td><span class="text1">079</span><span id="draw_td_079"><span class="text2 red_big">5 1 3 1 3</span><span class="text1">小单</span><span class="text1">小单</span><span class="text1"><span class="orange">组三</span></span></span></td><td><span class="text1">119</span><span id="draw_td_119"></span></td></tr><tr class="bgcolor"><td><span class="text1">040</span></td></tr></tbody>
- </table>
复制代码
我的代码长这样
- # -*- coding: utf-8 -*-
- """
- Created on Fri Jan 20 15:17:21 2017
- @author: Administrator
- """
- # -*- coding: utf-8 -*-
- from bs4 import BeautifulSoup
- import urllib2
- url = "http://baidu.lecai.com/lottery/draw/sorts/cqssc.php?agentId=5591"
- response = urllib2.urlopen(url)
- soup = BeautifulSoup(response, 'html.parser')
- dt = soup.find_all('table')
- print dt
复制代码
部分结果输出如下
- [<table cellpadding="0" cellspacing="0" class="winning-lottery-table" width="100%">\n<thead>\n<tr>\n<th width="33.3%"><span class="text1">\u671f\u53f7</span><span class="text2">\u5f00\u5956\u53f7</span><span class="text1">\u5341\u4f4d</span><span class="text1">\u4e2a\u4f4d</span><span class="text1">\u540e\u4e09</span></th>\n<th width="33.3%"><span class="text1">\u671f\u53f7</span><span class="text2">\u5f00\u5956\u53f7</span><span class="text1">\u5341\u4f4d</span><span class="text1">\u4e2a\u4f4d</span><span class="text1">\u540e\u4e09</span></th>\n<th><span class="text1">\u671f\u53f7</span><span class="text2">\u5f00\u5956\u53f7</span><span class="text1">\u5341\u4f4d</span><span class="text1">\u4e2a\u4f4d</span><span class="text1">\u540e\u4e09</span></tbody>\n</table>]
- [Finished in 0.9s]
复制代码 应该是编码的问题,我想问应该怎么解决?求助各位大神!
求解决
|
|