python提取xml里面的链接源码详解-侯体宗的博客

python提取xml里面的链接源码详解
Python / 管理员发布于 8年前 294

因群里朋友需要提取xml地图里面的链接，就写了这个程序。

代码：

#coding=utf-8import urllibimport urllib.requestimport reurl='http://zhimo.yuanzhumuban.cc/sitemaps.xml'html=urllib.request.urlopen(url).read()html=html.decode('utf-8')r=re.compile(r'(http://zhimo.yuanzhumuban.cc.*?\.html)')big=re.findall(r,html)for i in big: print(i) op_xml_txt=open('xml.txt','a') op_xml_txt.write('%s\n'%i)

扩展阅读：

Python3提取xml文件中的内容

import xml.dom.minidomdef find_child(Par_nodes, mystr):  for child_node in Par_nodes:    if(len(child_node.childNodes) > 0):      mystr = find_child(child_node.childNodes, mystr)    elif(child_node.nodeValue != None):      mystr += child_node.data.replace('\n', '')  return mystrif __name__ == '__main__':  dom1 = xml.dom.minidom.parse('2.XML') #打开xml文件  root = dom1.documentElement     #得到文档元素对象  app_nums = root.getElementsByTagName('base:DocNumber') #按标签名称查找，返回标签结点数组  app_num = app_nums[2]  print('专利申请号：'+app_num.firstChild.data)  titles = root.getElementsByTagName('business:InventionTitle')  title = titles[0]  print('专利名称：'+title.firstChild.data)  Paragraphs = root.getElementsByTagName('base:Paragraphs')  abstract = Paragraphs[0]  print('专利摘要：'+abstract.firstChild.data)  company_names = root.getElementsByTagName('base:Name')  company_name = company_names[0]  print('公司名称：'+company_name.firstChild.data)  mystr = ''  for i in range(len(Paragraphs)):    if (Paragraphs[i].firstChild.data == '发明内容\n\t'):      i+=1      while Paragraphs[i].firstChild.data != '附图说明\n\t':        mystr = find_child(Paragraphs[i].childNodes, mystr)        i+=1  print('发明内容：' + mystr)

以上就是本次介绍的全部实例代码知识点，感谢大家的学习和的支持。

上一条：
500行代码使用python写个微信小游戏飞机大战游戏
下一条：
python yield关键词案例测试

0条评论 (评论内容有缓存机制,请悉知!)

最新最热

近期评论
test1 在
opencode + Oh-my-openagent,我的第一个免费的ai编程智能体管家:Sisyphus中评论 test..
122 在
学历：一种延缓就业设计，生活需求下的权衡之选中评论工作几年后，报名考研了，到现在还没认真学习备考，迷茫中。作为一名北漂互联网打工人..
Zita 在
Google AI Studio升级全栈 vibe coding体验，可直接构建带登录和数据库的应用中评论 111222..
123 在
Clash for Windows作者删库跑路了，github已404中评论按理说只要你在国内，所有的流量进出都在监控范围内，不管你怎么隐藏也没用，想搞你分..
原梓番博客在
在Laravel框架中使用模型Model分表最简单的方法中评论好久好久都没看友情链接申请了，今天刚看，已经添加。..

Top