|
上海天气多变,一般公司还都得备把伞啥的,上海气象信息网(http://www.soweather.com/index.html)是本地权威的天气信息平台。上面有最近5天还有当天的逐小时天气信息。
特别是这个逐小时天气信息对不测之风云特有用,不过当时这网站没啥微信,APP啥的,要看只有去网站上,于是弄个爬虫程序抓之。
python 2.7环境:
- #!/usr/bin/python
- # -*- coding: utf-8 -*-
- import urllib2
- from bs4 import BeautifulSoup
- html_doc = "http://www.soweather.com/todayweather.html"
- page=urllib2.urlopen(html_doc)
- soup = BeautifulSoup(page, "html.parser")
- #print soup.prettify()
- today='\n'+'今日天气'+'\n'
- print today.decode('utf-8').encode('gbk','ignore')
- todaylist=soup.find(class_="left1a")
- todaystr=todaylist.get_text()
- print todaystr.encode('gbk','ignore')
- fiveday='\n'+'未来五天预报'+'\n'
- print fiveday.decode('utf-8').encode('gbk','ignore')
- for one in soup(class_="sh5dayb sh5daybborder"):
- today=one.get_text()
- today_str=""
- lineNum=0
- for line in today.splitlines():
- line=line.lstrip()
- if (lineNum==0 or lineNum==5):
- endLine=''
- else:
- endLine=' '
- line=line.rstrip()+endLine
-
-
- today_str=today_str+line
- lineNum=lineNum+1
-
-
- #print today_str.__class__
- print today_str.encode('gbk','ignore')
- hour='\n\n'+'逐小时预报'+'\n'
- print hour.decode('utf-8').encode('gbk','ignore')
- name='时间'+'\t'+'温度'+'\t'+'湿度'+'\t'+'风速'+'\t'+'雨量'+'\t'+'体感'
- print name.decode('utf-8').encode('gbk','ignore')
- hour_list=soup.find(class_="childta")
- ul_list=hour_list('tr')
- hour_num=0
- for mytext in ul_list:
- mytext=mytext.get_text()
- hour_str=""
- num=0
- for line in mytext.splitlines():
- line=line.rstrip()+'\t'
- if (num==1 or num==6 or num==7 or num==9 or num==10 or num==11):
- hour_str=hour_str+line
- num=num+1
- if (hour_num<>0):
- print hour_str.encode('gbk','ignore')
- hour_num=hour_num+1
- raw_input()
复制代码
不过现在这网站上微信,APP都有了,手机弄一个,很方便了
|
|