python实现的爬取电影下载链接功能示例-侯体宗的博客

python实现的爬取电影下载链接功能示例
Python / 管理员发布于 8年前 295

本文实例讲述了python实现的爬取电影下载链接功能。分享给大家供大家参考，具体如下：

#!/usr/bin/python#coding=UTF-8import sysimport urllib2import osimport chardetfrom bs4 import BeautifulSoupreload(sys)sys.setdefaultencoding("utf-8")#从电影html页面中获取视频下载地址def get_movie_download_url(html):  soup=BeautifulSoup(html,'html.parser')  fixed_html=soup.prettify()  td=soup.find('td',attrs={'style':'WORD-WRAP: break-word'})  url_a=td.find('a')  url_a=url_a.string  return url_a#从电影html页面中获取电影标题def get_movie_title(html):  soup=BeautifulSoup(html,'html.parser')  fixed_html=soup.prettify()  title=soup.find('h1')  title=title.string  return title#访问url，返回html页面def get_html(url):  req=urllib2.Request(url)  req.add_header('User-Agent','Mozilla/5.0')  response=urllib2.urlopen(url)  html=response.read()  return html#从电影列表页，获取电影的url，拼接好，存到列表后返回def get_movie_list(url):  m_list = []  html = get_html(url)  soup=BeautifulSoup(html,'html.parser')  fixed_html=soup.prettify()  a_urls=soup.find_all('a',attrs={'class':'ulink'})  host = "http://www.ygdy8.net"  for a_url in a_urls:    m_url=a_url.get('href')    m_list.append(host+m_url)  return m_list#存入txt文件def file_edit(wr_str):  f1 = open(r'e:\down_load_url.txt','a')  f1.write(wr_str)  f1.close()#传入电影url的列表集合，获取下载地址，并写入文件def write_to_txt(a_urls):  for a_url in a_urls:    html=get_html(a_url)    html=html.decode('GBK')    write_title=get_movie_title(html)    write_url=get_movie_download_url(html)    file_edit(write_title+"\n")    file_edit(write_url+"\n")    file_edit("\n")#传入页数，返回这几页的url列表def get_pages_url(num):  urls_list = []  url="http://www.ygdy8.net/html/gndy/dyzz/list_23_"  for n in range(1,num+1):    new_url = url+str(n)+".html"    urls_list.append(new_url)  return urls_listif __name__=='__main__':  pages = 2 #打算爬取几页电影  p_url = get_pages_url(pages)  for i in p_url:    write_to_txt(get_movie_list(i))#执行写入  print "done"

更多关于Python相关内容可查看本站专题：《Python Socket编程技巧总结》、《Python数据结构与算法教程》、《Python函数使用技巧总结》、《Python字符串操作技巧汇总》、《Python入门与进阶经典教程》及《Python文件与目录操作技巧汇总》

希望本文所述对大家Python程序设计有所帮助。

上一条：
Python3 文章标题关键字提取的例子
下一条：
Python使用itchat模块实现简单的微信控制电脑功能示例

0条评论 (评论内容有缓存机制,请悉知!)

最新最热

近期评论
test1 在
opencode + Oh-my-openagent,我的第一个免费的ai编程智能体管家:Sisyphus中评论 test..
122 在
学历：一种延缓就业设计，生活需求下的权衡之选中评论工作几年后，报名考研了，到现在还没认真学习备考，迷茫中。作为一名北漂互联网打工人..
Zita 在
Google AI Studio升级全栈 vibe coding体验，可直接构建带登录和数据库的应用中评论 111222..
123 在
Clash for Windows作者删库跑路了，github已404中评论按理说只要你在国内，所有的流量进出都在监控范围内，不管你怎么隐藏也没用，想搞你分..
原梓番博客在
在Laravel框架中使用模型Model分表最简单的方法中评论好久好久都没看友情链接申请了，今天刚看，已经添加。..

Top