python利用re,bs4,requests模块获取股票数据-侯体宗的博客

python利用re,bs4,requests模块获取股票数据
Python / 管理员发布于 8年前 293

今天闲来无聊无意间看到了百度股票，就想着用python爬一下数据，于是就找到了东方财经网，结合这两个网站，写了一个小爬虫，数据保存在文件中，比较简单的示例，就当做用来练习正则表达式和BeautifulSoupl了。

首先页面分析，打开东方财经网股票列表页，

和百度股票详情页，右键查看网页源代码，

网址后面的代码就是股票代码，所以打算先获取股票代码，然后获取详情，废话少说，直接上代码吧：

import reimport requestsfrom bs4 import BeautifulSoup#获取htmldef getHtml(url):try:req=requests.get(url)req.raise_for_status()req.encoding=req.apparent_encodingreturn req.textexcept :print('getHtml失败')#获取股票代码def getStockList(lst,stockUrl):html=getHtml(stockUrl)soup=BeautifulSoup(html,'html.parser')a=soup.find_all('a')for i in a:try:href=i.attrs['href']lst.append(re.findall(r'[s][hz]\d{6}',href)[0])except:continue#获取股票详情def getStockInfo(lst,stockUrl,fpath):count=0for stock in lst:url=stockUrl+stock+'.html'html=getHtml(url)try:if html=='':continueinfoDict={}soup=BeautifulSoup(html,'html.parser')stockInfo=soup.find('div',attrs={'class':'stock-bets'})name=stockInfo.find_all(attrs={'class':'bets-name'})[0]infoDict.update({'股票名称':name.text.split()[0]})keyList=stockInfo.find_all('dt')valueList=stockInfo.find_all('dd')for i in range(len(keyList)):key=keyList[i].textval=valueList[i].textinfoDict[key]=valwith open(fpath,'a',encoding='utf-8') as f:f.write(str(infoDict)+'\n')count+=1print('\r当前速度：{:.2f}%'.format(count*100/len(lst)),end='')except:count+=1print('\r当前速度e：{:.2f}%'.format(count*100/len(lst)),end='')continuedef main():stockListUrl='http://quote.eastmoney.com/stocklist.html'stockInfotUrl='https://gupiao.baidu.com/stock/'outPutFile='D:\python\shuju\stockInfo.txt'slist=[]getStockList(slist,stockListUrl)getStockInfo(slist,stockInfotUrl,outPutFile)main()

以上就是本文的全部内容，希望对大家的学习有所帮助，也希望大家多多支持。

上一条：
python cumsum函数的具体使用
下一条：
实例详解Python装饰器与闭包

0条评论 (评论内容有缓存机制,请悉知!)

最新最热

近期评论
test1 在
opencode + Oh-my-openagent,我的第一个免费的ai编程智能体管家:Sisyphus中评论 test..
122 在
学历：一种延缓就业设计，生活需求下的权衡之选中评论工作几年后，报名考研了，到现在还没认真学习备考，迷茫中。作为一名北漂互联网打工人..
Zita 在
Google AI Studio升级全栈 vibe coding体验，可直接构建带登录和数据库的应用中评论 111222..
123 在
Clash for Windows作者删库跑路了，github已404中评论按理说只要你在国内，所有的流量进出都在监控范围内，不管你怎么隐藏也没用，想搞你分..
原梓番博客在
在Laravel框架中使用模型Model分表最简单的方法中评论好久好久都没看友情链接申请了，今天刚看，已经添加。..

Top