Python3 使用selenium插件爬取苏宁商家联系电话-侯体宗的博客

Python3 使用selenium插件爬取苏宁商家联系电话
Python / 管理员发布于 8年前 244

Selenium简介

Selenium是一个用于测试网站的自动化测试工具，支持各种浏览器包括Chrome、Firefox、Safari等主流界面浏览器，同时也支持phantomJS无界面浏览器。

此处使用了selenium插件使用的是火狐浏览器信息存储到csv表格里面

前面详细不多讲如果条件不满足自行百度安装

# -*- coding: utf-8 -*-"""Created on Wed Dec 11 20:21:04 2019@author: Administrator"""from selenium import webdriverimport timeimport randomimport csvimport codecs#此处为要爬取的页数默认为 50页yema = 50#要爬取的网址 此处网址为搜索详细产品出现的产品搜索结果页#注意苏宁搜索行业词出现的产品页面是不一样的wangzhi = "https://search.suning.com/%E4%BC%91%E9%97%B2%E9%A3%9F%E5%93%81/"#codevs 防止中文写入时乱码f = codecs.open('suning.csv','a',encoding='utf-8')csv_writer = csv.writer(f)#谷歌#browser = webdriver.chrom.webdirver.WebDriver(executable_path="chromedriver")#火狐browser1 = webdriver.Firefox(executable_path="geckodriver")def browser_1(url,browser=browser1):  #打开网页  browser.get(url)    return(browser)browser = browser_1(wangzhi)#通过class找到元素#input_guanggao = browser.find_element_by_class_name("close-btn")#点一下#input_guanggao.click()#输入#input_txt.send_keys("111")#翻页键#next_page = browser.find_element_by_class_name("next")#数据提取urls = []nub = 1for i in range(yema-1):  print(i)  #将滚动条拖到底部  js="var q=document.documentElement.scrollTop=100000"  browser.execute_script(js)  time.sleep(random.randint(5,10))  shops = browser.find_elements_by_class_name("sellPoint")  for shop in shops:    #print(shop)    #print("*"*10)    url = shop.get_attribute('href')    vip = "/0000000000/"    if vip not in url:      urls.append(url)      print(i,"--",nub,"--",url)      nub += 1  print(i,"页")  i += 1  js="var q=document.documentElement.scrollTop=500"  browser.execute_script(js)  time.sleep(random.randint(3,5))  next_page = browser.find_element_by_class_name("next")  time.sleep(random.randint(3,5))  next_page.click()  time.sleep(random.randint(5,8))print("---"*10)for ul in urls:  browser_shop = browser_1(ul)  #公司名称  chead_companyName = browser_shop.find_element_by_id("chead_companyName")  #电话  chead_telPhone = browser_shop.find_element_by_id("chead_telPhone")  #地址  chead_companyAddress = browser_shop.find_element_by_id("chead_companyAddress")  browser_shop.find_element_by_class_name("storname").click()  #chead_telPhone.find_element_by_xpath("//*[contains(text(),'13816391436')]").click()  companyName = chead_companyName.text  if companyName == "":    companyName = "null"  telPhone = chead_telPhone.text  if telPhone == "":    telPhone = "null"  companyAddress = chead_companyAddress.text  if companyAddress == "":    companyAddress = "null"  print(companyName,"==",telPhone,"==",companyAddress)  csv_writer.writerow([companyName,telPhone,companyAddress])  #browser_shop.close()f.close()print("结束")

总结

以上所述是小编给大家介绍的Python3 使用selenium插件爬取苏宁商家联系电话，希望对大家有所帮助，如果大家有任何疑问请给我留言，小编会及时回复大家的。在此也非常感谢大家对站的支持！
如果你觉得本文对你有帮助，欢迎转载，烦请注明出处，谢谢！

上一条：
python读取Kafka实例
下一条：
kafka-python 获取topic lag值方式

0条评论 (评论内容有缓存机制,请悉知!)