侯体宗的博客
  • 首页
  • Hyperf版
  • beego仿版
  • 人生(杂谈)
  • 技术
  • 关于我
  • 更多分类
    • 文件下载
    • 文字修仙
    • 中国象棋ai
    • 群聊
    • 九宫格抽奖
    • 拼图
    • 消消乐
    • 相册

win7 x64系统中安装Scrapy的方法

Windows  /  管理员 发布于 5年前   313

scrapy是用python开发的爬虫框架,从网上查了安装方法,感觉都说的挺复杂,而且很多教程都很有年头了,于是记录了自己的安装过程。

首先安装python,地址:https://www.python.org/downloads/release/python-2710/,注意根据你的系统下64位(Windows x86-64 MSI installer)还是32位的(Windows x86 MSI installer)。

现在是python3.6的天下了,建议大家安装python3版本。

装完以后就可以安装scrapy了,推荐使用pip方式安装,因为scrapy需要调用很多额外的库,pip会全部帮你安装好,不需要你在到处翻找了。

pip在python安装完后就已经有了,不需要额外安装,下面只要按照scrapy官网推荐的方法在命令提示符中输入pip installscrapy(图1),然后只需静静等待即可大功告成。


图1

装完以后可以敲入命令pip list看看已安装的库(图2),出来很多啊,pip真是好东西。


图2

现在试下看看建个爬虫项目,按照说明文档键入命令scrapy startproject tutorial,目录已经出来(图3),看来是没问题了。但为了验证是否安装成功,还得跑一下看看,第一次创建项目的时候,系统会提示可以跑个例子看看(图4)。按照提示键入命令


图3

图4

scrapy genspider example example.com创建一个爬虫,再键入命令scrapy crawl example

运行爬虫,结果如下(图5),报错了,貌似是缺少win32api,立即上网下了一个(http://sourceforge.net/projects/pywin32/files/pywin32/Build%20219/),


图5

下的时候注意对应的python版本。win32api装好以后再运行一次爬虫(图6),这次成功了,应该是没问题了。


图6

总结一下,其实刚开始网上找资料的时候看到上面写的要先装这个库那个库的时候心中很忐忑,结果发现不是很复杂,大多数问题pip都给解决了,剩下的就是具体问题具体研究,不过也没碰到很复杂解决不了的问题。另外吐下槽就是网上的教程互抄的太厉害,看着一搜一堆,其实多数都大同小异,真正有价值的没几个,没大腿抱就是辛苦呀。

最后说一下,scrapy目前还不支持python3.x版本,我用的是python2.7,如果你碰到莫名其妙的问题时请先看看自己有没有装错python版本。

下面是其他网友补充的文章

环境

Windows7 64位
Python2.7.6 64位

Python的安装:

  • 打开http://www.python.org/getit/releases/2.7.6/页面,下载Python-2.7.6.amd64.msi 进行安装,安装完成后,需要配置环境变量,环境变量的配置可以参考该文章
  • 测试python是否安装成功,如果python成功安装并且配置好环境变量,那么在cmd中输入python,就能得到python版本的详细信息(如32位或64位)
C:\Users\Administrator>pythonPython 2.7.6 (default, Nov 10 2013, 19:24:24) [MSC v.1500 64 bit (AMD64)] on win32

easy_install的安装

保存ez_setup.py至本地,如D盘(如果失效了,可以参考下https:///article/151027.htm

#!/usr/bin/env python"""Setuptools bootstrapping installer.Maintained at https://github.com/pypa/setuptools/tree/bootstrap.Run this script to install or upgrade setuptools.This method is DEPRECATED. Check https://github.com/pypa/setuptools/issues/581 for more details."""import osimport shutilimport sysimport tempfileimport zipfileimport optparseimport subprocessimport platformimport textwrapimport contextlibfrom distutils import logtry: from urllib.request import urlopenexcept ImportError: from urllib2 import urlopentry: from site import USER_SITEexcept ImportError: USER_SITE = None# 33.1.1 is the last version that supports setuptools self upgrade/installation.DEFAULT_VERSION = "33.1.1"DEFAULT_URL = "https://pypi.io/packages/source/s/setuptools/"DEFAULT_SAVE_DIR = os.curdirDEFAULT_DEPRECATION_MESSAGE = "ez_setup.py is deprecated and when using it setuptools will be pinned to {0} since it's the last version that supports setuptools self upgrade/installation, check https://github.com/pypa/setuptools/issues/581 for more info; use pip to install setuptools"MEANINGFUL_INVALID_ZIP_ERR_MSG = 'Maybe {0} is corrupted, delete it and try again.'log.warn(DEFAULT_DEPRECATION_MESSAGE.format(DEFAULT_VERSION))def _python_cmd(*args): """ Execute a command. Return True if the command succeeded. """ args = (sys.executable,) + args return subprocess.call(args) == 0def _install(archive_filename, install_args=()): """Install Setuptools.""" with archive_context(archive_filename): # installing log.warn('Installing Setuptools') if not _python_cmd('setup.py', 'install', *install_args):  log.warn('Something went wrong during the installation.')  log.warn('See the error message above.')  # exitcode will be 2  return 2def _build_egg(egg, archive_filename, to_dir): """Build Setuptools egg.""" with archive_context(archive_filename): # building an egg log.warn('Building a Setuptools egg in %s', to_dir) _python_cmd('setup.py', '-q', 'bdist_egg', '--dist-dir', to_dir) # returning the result log.warn(egg) if not os.path.exists(egg): raise IOError('Could not build the egg.')class ContextualZipFile(zipfile.ZipFile): """Supplement ZipFile class to support context manager for Python 2.6.""" def __enter__(self): return self def __exit__(self, type, value, traceback): self.close() def __new__(cls, *args, **kwargs): """Construct a ZipFile or ContextualZipFile as appropriate.""" if hasattr(zipfile.ZipFile, '__exit__'):  return zipfile.ZipFile(*args, **kwargs) return super(ContextualZipFile, cls).__new__(cls)@contextlib.contextmanagerdef archive_context(filename): """ Unzip filename to a temporary directory, set to the cwd. The unzipped target is cleaned up after. """ tmpdir = tempfile.mkdtemp() log.warn('Extracting in %s', tmpdir) old_wd = os.getcwd() try: os.chdir(tmpdir) try:  with ContextualZipFile(filename) as archive:  archive.extractall() except zipfile.BadZipfile as err:  if not err.args:  err.args = ('', )  err.args = err.args + (  MEANINGFUL_INVALID_ZIP_ERR_MSG.format(filename),  )  raise # going in the directory subdir = os.path.join(tmpdir, os.listdir(tmpdir)[0]) os.chdir(subdir) log.warn('Now working in %s', subdir) yield finally: os.chdir(old_wd) shutil.rmtree(tmpdir)def _do_download(version, download_base, to_dir, download_delay): """Download Setuptools.""" py_desig = 'py{sys.version_info[0]}.{sys.version_info[1]}'.format(sys=sys) tp = 'setuptools-{version}-{py_desig}.egg' egg = os.path.join(to_dir, tp.format(**locals())) if not os.path.exists(egg): archive = download_setuptools(version, download_base,  to_dir, download_delay) _build_egg(egg, archive, to_dir) sys.path.insert(0, egg) # Remove previously-imported pkg_resources if present (see # https://bitbucket.org/pypa/setuptools/pull-request/7/ for details). if 'pkg_resources' in sys.modules: _unload_pkg_resources() import setuptools setuptools.bootstrap_install_from = eggdef use_setuptools( version=DEFAULT_VERSION, download_base=DEFAULT_URL, to_dir=DEFAULT_SAVE_DIR, download_delay=15): """ Ensure that a setuptools version is installed. Return None. Raise SystemExit if the requested version or later cannot be installed. """ to_dir = os.path.abspath(to_dir) # prior to importing, capture the module state for # representative modules. rep_modules = 'pkg_resources', 'setuptools' imported = set(sys.modules).intersection(rep_modules) try: import pkg_resources pkg_resources.require("setuptools>=" + version) # a suitable version is already installed return except ImportError: # pkg_resources not available; setuptools is not installed; download pass except pkg_resources.DistributionNotFound: # no version of setuptools was found; allow download pass except pkg_resources.VersionConflict as VC_err: if imported:  _conflict_bail(VC_err, version) # otherwise, unload pkg_resources to allow the downloaded version to # take precedence. del pkg_resources _unload_pkg_resources() return _do_download(version, download_base, to_dir, download_delay)def _conflict_bail(VC_err, version): """ Setuptools was imported prior to invocation, so it is unsafe to unload it. Bail out. """ conflict_tmpl = textwrap.dedent(""" The required version of setuptools (>={version}) is not available, and can't be installed while this script is running. Please install a more recent version first, using 'easy_install -U setuptools'. (Currently using {VC_err.args[0]!r}) """) msg = conflict_tmpl.format(**locals()) sys.stderr.write(msg) sys.exit(2)def _unload_pkg_resources(): sys.meta_path = [ importer for importer in sys.meta_path if importer.__class__.__module__ != 'pkg_resources.extern' ] del_modules = [ name for name in sys.modules if name.startswith('pkg_resources') ] for mod_name in del_modules: del sys.modules[mod_name]def _clean_check(cmd, target): """ Run the command to download target. If the command fails, clean up before re-raising the error. """ try: subprocess.check_call(cmd) except subprocess.CalledProcessError: if os.access(target, os.F_OK):  os.unlink(target) raisedef download_file_powershell(url, target): """ Download the file at url to target using Powershell. Powershell will validate trust. Raise an exception if the command cannot complete. """ target = os.path.abspath(target) ps_cmd = ( "[System.Net.WebRequest]::DefaultWebProxy.Credentials = " "[System.Net.CredentialCache]::DefaultCredentials; " '(new-object System.Net.WebClient).DownloadFile("%(url)s", "%(target)s")' % locals() ) cmd = [ 'powershell', '-Command', ps_cmd, ] _clean_check(cmd, target)def has_powershell(): """Determine if Powershell is available.""" if platform.system() != 'Windows': return False cmd = ['powershell', '-Command', 'echo test'] with open(os.path.devnull, 'wb') as devnull: try:  subprocess.check_call(cmd, stdout=devnull, stderr=devnull) except Exception:  return False return Truedownload_file_powershell.viable = has_powershelldef download_file_curl(url, target): cmd = ['curl', url, '--location', '--silent', '--output', target] _clean_check(cmd, target)def has_curl(): cmd = ['curl', '--version'] with open(os.path.devnull, 'wb') as devnull: try:  subprocess.check_call(cmd, stdout=devnull, stderr=devnull) except Exception:  return False return Truedownload_file_curl.viable = has_curldef download_file_wget(url, target): cmd = ['wget', url, '--quiet', '--output-document', target] _clean_check(cmd, target)def has_wget(): cmd = ['wget', '--version'] with open(os.path.devnull, 'wb') as devnull: try:  subprocess.check_call(cmd, stdout=devnull, stderr=devnull) except Exception:  return False return Truedownload_file_wget.viable = has_wgetdef download_file_insecure(url, target): """Use Python to download the file, without connection authentication.""" src = urlopen(url) try: # Read all the data in one block. data = src.read() finally: src.close() # Write all the data in one block to avoid creating a partial file. with open(target, "wb") as dst: dst.write(data)download_file_insecure.viable = lambda: Truedef get_best_downloader(): downloaders = ( download_file_powershell, download_file_curl, download_file_wget, download_file_insecure, ) viable_downloaders = (dl for dl in downloaders if dl.viable()) return next(viable_downloaders, None)def download_setuptools( version=DEFAULT_VERSION, download_base=DEFAULT_URL, to_dir=DEFAULT_SAVE_DIR, delay=15, downloader_factory=get_best_downloader): """ Download setuptools from a specified location and return its filename. `version` should be a valid setuptools version number that is available as an sdist for download under the `download_base` URL (which should end with a '/'). `to_dir` is the directory where the egg will be downloaded. `delay` is the number of seconds to pause before an actual download attempt. ``downloader_factory`` should be a function taking no arguments and returning a function for downloading a URL to a target. """ # making sure we use the absolute path to_dir = os.path.abspath(to_dir) zip_name = "setuptools-%s.zip" % version url = download_base + zip_name saveto = os.path.join(to_dir, zip_name) if not os.path.exists(saveto): # Avoid repeated downloads log.warn("Downloading %s", url) downloader = downloader_factory() downloader(url, saveto) return os.path.realpath(saveto)def _build_install_args(options): """ Build the arguments to 'python setup.py install' on the setuptools package. Returns list of command line arguments. """ return ['--user'] if options.user_install else []def _parse_args(): """Parse the command line for options.""" parser = optparse.OptionParser() parser.add_option( '--user', dest='user_install', action='store_true', default=False, help='install in user site package') parser.add_option( '--download-base', dest='download_base', metavar="URL", default=DEFAULT_URL, help='alternative URL from where to download the setuptools package') parser.add_option( '--insecure', dest='downloader_factory', action='store_const', const=lambda: download_file_insecure, default=get_best_downloader, help='Use internal, non-validating downloader' ) parser.add_option( '--version', help="Specify which version to download", default=DEFAULT_VERSION, ) parser.add_option( '--to-dir', help="Directory to save (and re-use) package", default=DEFAULT_SAVE_DIR, ) options, args = parser.parse_args() # positional arguments are ignored return optionsdef _download_args(options): """Return args for download_setuptools function from cmdline args.""" return dict( version=options.version, download_base=options.download_base, downloader_factory=options.downloader_factory, to_dir=options.to_dir, )def main(): """Install or upgrade setuptools and EasyInstall.""" options = _parse_args() archive = download_setuptools(**_download_args(options)) return _install(archive, _build_install_args(options))if __name__ == '__main__': sys.exit(main())

在cmd中运行:

d:\>python ez_setup.py

进行SetupTools的安装

在运行的时候会发生一个错误,该错误为"ascii codec can't decode byte 0xe8 in position 0:ordinal not in range(128)",大意为ascii编码不能解析byte 0xe8。
解决方法:找到并打开python根目录/Lib/mimetypes.py文件,在import urllib后,添加代码:

reload(sys)sys.setdefaultencoding('gbk')

把默认编码方式改为gbk(网上有写用utf8的,在这个脚本中是无效的,需要改成gbk格式)。重新执行python ez_setup.py,如果出现刷屏的安装信息,则说明安装成功了。此时,在python目录下多了一个Script文件夹,easy_install就在里面

Scrapy依赖项的安装

Scrapy的依赖项

安装lxml-3.2.4.win32-py2.7.exe(64位系统需要安装lxml-3.2.4.win-amd64-py2.7.exe)
安装pywin32-218.win32-py2.7.exe(64位系统需要安装pywin32-218.win-amd64-py2.7.exe)
安装Twisted-13.2.0.win32-py2.7.exe(64位系统需要安装Twisted-13.2.0.win-amd64-py2.7.exe)
安装pyOpenSSL-0.13.1.win32-py2.7.exe(64位系统需要安装pyOpenSSL-0.13.1.win-amd64-py2.7.exe)
将zope.interface-4.0.5-py2.7-win32.egg拷贝到C:\Python27\Scripts目录下,执行$ easy_install.exe zope.interface-4.0.5-py2.7-win32.egg

验证scrapy依赖项是否安装成功的方法:

cmd执行$ python进入python控制台

执行import lxml,如果没报错,则说明lxml安装成功
执行import twisted,如果没报错,则说明twisted安装成功
执行import OpenSSL,如果没报错,则说明OpenSSL安装成功
执行import zope.interface,如果没报错,则说明zope.interface安装成功
如果安装成功,那么在cmd中执行& python,然后执行import lxml,如果没有报错,则说明lxml安装成功。

安装Scrapy

方法1: 控制台输入:easy_install scrapy
方法2:解压缩Scrapy-0.22.2.tar.gz,在其目录下执行$ python setup.py install进行Scrapy的安装。

检查Scrapy是否安装成功的方法:可以在cmd控制台执行 $ scrapy ,如果没有报错,说明安装成功。

相关文章

  • Windows下Scrapy环境搭建
  • Windows 8.1 (64bit) 下搭建 Scrapy 0.22 环境
  • Python第三方Window模块安装文件

这篇文章就介绍到这了,需要的朋友可以参考一下。


  • 上一条:
    Window环境下Scrapy开发环境搭建
    下一条:
    Windows下Python3.6安装第三方模块的方法
  • 昵称:

    邮箱:

    0条评论 (评论内容有缓存机制,请悉知!)
    最新最热
    • 分类目录
    • 人生(杂谈)
    • 技术
    • linux
    • Java
    • php
    • 框架(架构)
    • 前端
    • ThinkPHP
    • 数据库
    • 微信(小程序)
    • Laravel
    • Redis
    • Docker
    • Go
    • swoole
    • Windows
    • Python
    • 苹果(mac/ios)
    • 相关文章
    • Windows 10的告别:2025年10月14日,一段时代的终结(0个评论)
    • windows 11激活_Win11 KMS激活流程步骤(1个评论)
    • 安装Windows 11系统的注意了,看看你的cpu是否在微软兼容列表排除中(1个评论)
    • 微软将于2022年9月20日推送Windows11 22H2新版本,推测2024发布windows 12(0个评论)
    • windows11系统中可以关闭禁止的服务及介绍(1个评论)
    • 近期文章
    • 智能合约Solidity学习CryptoZombie第二课:让你的僵尸猎食(0个评论)
    • 智能合约Solidity学习CryptoZombie第一课:生成一只你的僵尸(0个评论)
    • 在go中实现一个常用的先进先出的缓存淘汰算法示例代码(0个评论)
    • 在go+gin中使用"github.com/skip2/go-qrcode"实现url转二维码功能(0个评论)
    • 在go语言中使用api.geonames.org接口实现根据国际邮政编码获取地址信息功能(1个评论)
    • 在go语言中使用github.com/signintech/gopdf实现生成pdf分页文件功能(0个评论)
    • gmail发邮件报错:534 5.7.9 Application-specific password required...解决方案(0个评论)
    • 欧盟关于强迫劳动的规定的官方举报渠道及官方举报网站(0个评论)
    • 在go语言中使用github.com/signintech/gopdf实现生成pdf文件功能(0个评论)
    • Laravel从Accel获得5700万美元A轮融资(0个评论)
    • 近期评论
    • 122 在

      学历:一种延缓就业设计,生活需求下的权衡之选中评论 工作几年后,报名考研了,到现在还没认真学习备考,迷茫中。作为一名北漂互联网打工人..
    • 123 在

      Clash for Windows作者删库跑路了,github已404中评论 按理说只要你在国内,所有的流量进出都在监控范围内,不管你怎么隐藏也没用,想搞你分..
    • 原梓番博客 在

      在Laravel框架中使用模型Model分表最简单的方法中评论 好久好久都没看友情链接申请了,今天刚看,已经添加。..
    • 博主 在

      佛跳墙vpn软件不会用?上不了网?佛跳墙vpn常见问题以及解决办法中评论 @1111老铁这个不行了,可以看看近期评论的其他文章..
    • 1111 在

      佛跳墙vpn软件不会用?上不了网?佛跳墙vpn常见问题以及解决办法中评论 网站不能打开,博主百忙中能否发个APP下载链接,佛跳墙或极光..
    • 2018-01
    • 2018-06
    • 2020-06
    • 2021-06
    • 2021-07
    • 2022-01
    • 2022-04
    • 2022-08
    • 2023-08
    • 2023-10
    • 2024-04
    Top

    Copyright·© 2019 侯体宗版权所有· 粤ICP备20027696号 PHP交流群

    侯体宗的博客