侯体宗的博客
  • 首页
  • Hyperf版
  • beego仿版
  • 人生(杂谈)
  • 技术
  • 关于我
  • 更多分类
    • 文件下载
    • 文字修仙
    • 中国象棋ai
    • 群聊
    • 九宫格抽奖
    • 拼图
    • 消消乐
    • 相册

pandas read_excel()和to_excel()函数解析

技术  /  管理员 发布于 7年前   212

前言

数据分析时候,需要将数据进行加载和存储,本文主要介绍和excel的交互。

read_excel()

加载函数为read_excel(),其具体参数如下。

read_excel(io, sheetname=0, header=0, skiprows=None, skip_footer=0, index_col=None,names=None, parse_cols=None, parse_dates=False,date_parser=None,na_values=None,thousands=None, convert_float=True, has_index_names=None, converters=None,dtype=None, true_values=None, false_values=None, engine=None, squeeze=False, **kwds)

常用参数解析:

  • io : string, path object ; excel 路径。
  • sheetname : string, int, mixed list of strings/ints, or None, default 0 返回多表使用sheetname=[0,1],若sheetname=None是返回全表 注意:int/string 返回的是dataframe,而none和list返回的是dict of dataframe
  • header : int, list of ints, default 0 指定列名行,默认0,即取第一行,数据为列名行以下的数据 若数据不含列名,则设定 header = None
  • skiprows : list-like,Rows to skip at the beginning,省略指定行数的数据
  • skip_footer : int,default 0, 省略从尾部数的int行数据
  • index_col : int, list of ints, default None指定列为索引列,也可以使用u”strings”
  • names : array-like, default None, 指定列的名字。

数据源:

sheet1:ID NUM-1  NUM-2  NUM-336901  142 168 66136902  78 521 60236903  144 600 52136904  95 457 46836905  69 596 695sheet2:ID NUM-1  NUM-2  NUM-336906  190 527 69136907  101 403 470

(1)函数原型

basestation ="F://pythonBook_PyPDAM/data/test.xls"data = pd.read_excel(basestation)print data

输出:是一个dataframe

   ID NUM-1 NUM-2 NUM-30 36901  142  168  6611 36902   78  521  6022 36903  144  600  5213 36904   95  457  4684 36905   69  596  695

(2) sheetname参数:返回多表使用sheetname=[0,1],若sheetname=None是返回全表 注意:int/string 返回的是dataframe,而none和list返回的是dict of dataframe

data_1 = pd.read_excel(basestation,sheetname=[0,1])print data_1print type(data_1)

输出:dict of dataframe

OrderedDict([(0,    ID NUM-1 NUM-2 NUM-30 36901  142  168  6611 36902   78  521  6022 36903  144  600  5213 36904   95  457  4684 36905   69  596  695), (1,    ID NUM-1 NUM-2 NUM-30 36906  190  527  6911 36907  101  403  470)])

(3)header参数:指定列名行,默认0,即取第一行,数据为列名行以下的数据 若数据不含列名,则设定 header = None ,注意这里还有列名的一行。

data = pd.read_excel(basestation,header=None)print data输出:    0   1   2   30   ID NUM-1 NUM-2 NUM-31 36901  142  168  6612 36902   78  521  6023 36903  144  600  5214 36904   95  457  4685 36905   69  596  695data = pd.read_excel(basestation,header=[3])print data输出:  36903 144  600  521 0 36904   95  457  4681 36905   69  596  695

(4) skiprows 参数:省略指定行数的数据

data = pd.read_excel(basestation,skiprows = [1])print data输出:   ID NUM-1 NUM-2 NUM-30 36902   78  521  6021 36903  144  600  5212 36904   95  457  4683 36905   69  596  695

(5)skip_footer参数:省略从尾部数的int行的数据

data = pd.read_excel(basestation, skip_footer=3)print data输出:   ID NUM-1 NUM-2 NUM-30 36901  142  168  6611 36902   78  521  602

(6)index_col参数:指定列为索引列,也可以使用u”strings”

data = pd.read_excel(basestation, index_col="NUM-3")print data输出:     ID NUM-1 NUM-2NUM-3           661  36901  142  168602  36902   78  521521  36903  144  600468  36904   95  457695  36905   69  596

(7)names参数: 指定列的名字。

data = pd.read_excel(basestation,names=["a","b","c","e"])print data    a  b  c  e0 36901 142 168 6611 36902  78 521 6022 36903 144 600 5213 36904  95 457 4684 36905  69 596 695

具体参数如下:

>>> print help(pandas.read_excel)Help on function read_excel in module pandas.io.excel:read_excel(io, sheetname=0, header=0, skiprows=None, skip_footer=0, index_col=None, names=None, parse_cols=None, parse_dates=False, date_parser=None, na_values=None, thousands=None, convert_float=True, has_index_names=None, converters=None, dtype=None, true_values=None, false_values=None, engine=None, squeeze=False, **kwds)  Read an Excel table into a pandas DataFrame  Parameters  ----------  io : string, path object (pathlib.Path or py._path.local.LocalPath),    file-like object, pandas ExcelFile, or xlrd workbook.    The string could be a URL. Valid URL schemes include http, ftp, s3,    and file. For file URLs, a host is expected. For instance, a local    file could be file://localhost/path/to/workbook.xlsx  sheetname : string, int, mixed list of strings/ints, or None, default 0    Strings are used for sheet names, Integers are used in zero-indexed    sheet positions.    Lists of strings/integers are used to request multiple sheets.    Specify None to get all sheets.    str|int -> DataFrame is returned.    list|None -> Dict of DataFrames is returned, with keys representing    sheets.    Available Cases    * Defaults to 0 -> 1st sheet as a DataFrame    * 1 -> 2nd sheet as a DataFrame    * "Sheet1" -> 1st sheet as a DataFrame    * [0,1,"Sheet5"] -> 1st, 2nd & 5th sheet as a dictionary of DataFrames    * None -> All sheets as a dictionary of DataFrames  header : int, list of ints, default 0    Row (0-indexed) to use for the column labels of the parsed    DataFrame. If a list of integers is passed those row positions will    be combined into a ``MultiIndex``  skiprows : list-like    Rows to skip at the beginning (0-indexed)  skip_footer : int, default 0    Rows at the end to skip (0-indexed)  index_col : int, list of ints, default None    Column (0-indexed) to use as the row labels of the DataFrame.    Pass None if there is no such column. If a list is passed,    those columns will be combined into a ``MultiIndex``. If a    subset of data is selected with ``parse_cols``, index_col    is based on the subset.  names : array-like, default None    List of column names to use. If file contains no header row,    then you should explicitly pass header=None  converters : dict, default None    Dict of functions for converting values in certain columns. Keys can    either be integers or column labels, values are functions that take one    input argument, the Excel cell content, and return the transformed    content.  dtype : Type name or dict of column -> type, default None    Data type for data or columns. E.g. {'a': np.float64, 'b': np.int32}    Use `object` to preserve data as stored in Excel and not interpret dtype.    If converters are specified, they will be applied INSTEAD    of dtype conversion.    .. versionadded:: 0.20.0  true_values : list, default None    Values to consider as True    .. versionadded:: 0.19.0  false_values : list, default None    Values to consider as False    .. versionadded:: 0.19.0  parse_cols : int or list, default None    * If None then parse all columns,    * If int then indicates last column to be parsed    * If list of ints then indicates list of column numbers to be parsed    * If string then indicates comma separated list of Excel column letters and     column ranges (e.g. "A:E" or "A,C,E:F"). Ranges are inclusive of     both sides.  squeeze : boolean, default False    If the parsed data only contains one column then return a Series  na_values : scalar, str, list-like, or dict, default None    Additional strings to recognize as NA/NaN. If dict passed, specific    per-column NA values. By default the following values are interpreted    as NaN: '', '#N/A', '#N/A N/A', '#NA', '-1.#IND', '-1.#QNAN', '-NaN', '-nan',  '1.#IND', '1.#QNAN', 'N/A', 'NA', 'NULL', 'NaN', 'nan'.  thousands : str, default None    Thousands separator for parsing string columns to numeric. Note that    this parameter is only necessary for columns stored as TEXT in Excel,    any numeric columns will automatically be parsed, regardless of display    format.  keep_default_na : bool, default True    If na_values are specified and keep_default_na is False the default NaN    values are overridden, otherwise they're appended to.  verbose : boolean, default False    Indicate number of NA values placed in non-numeric columns  engine: string, default None    If io is not a buffer or path, this must be set to identify io.    Acceptable values are None or xlrd  convert_float : boolean, default True    convert integral floats to int (i.e., 1.0 --> 1). If False, all numeric    data will be read in as floats: Excel stores all numbers as floats    internally  has_index_names : boolean, default None    DEPRECATED: for version 0.17+ index names will be automatically    inferred based on index_col. To read Excel output from 0.16.2 and    prior that had saved index names, use True.  Returns

to_excel()

存储函数为pd.DataFrame.to_excel(),注意,必须是DataFrame写入excel, 即Write DataFrame to an excel sheet。其具体参数如下:

to_excel(self, excel_writer, sheet_name='Sheet1', na_rep='', float_format=None,columns=None, header=True, index=True, index_label=None,startrow=0, startcol=0, engine=None, merge_cells=True, encoding=None,inf_rep='inf', verbose=True, freeze_panes=None)

常用参数解析

  • - excel_writer : string or ExcelWriter object File path or existing ExcelWriter目标路径
  • - sheet_name : string, default ‘Sheet1' Name of sheet which will contain DataFrame,填充excel的第几页
  • - na_rep : string, default ”,Missing data representation 缺失值填充
  • - float_format : string, default None Format string for floating point numbers
  • - columns : sequence, optional,Columns to write 选择输出的的列。
  • - header : boolean or list of string, default True Write out column names. If a list of string is given it is assumed to be aliases for the column names
  • - index : boolean, default True,Write row names (index)
  • - index_label : string or sequence, default None, Column label for index column(s) if desired. If None is given, andheader and index are True, then the index names are used. A sequence should be given if the DataFrame uses MultiIndex.
  • - startrow :upper left cell row to dump data frame
  • - startcol :upper left cell column to dump data frame
  • - engine : string, default None ,write engine to use - you can also set this via the options,io.excel.xlsx.writer, io.excel.xls.writer, andio.excel.xlsm.writer.
  • - merge_cells : boolean, default True Write MultiIndex and Hierarchical Rows as merged cells.
  • - encoding: string, default None encoding of the resulting excel file. Only necessary for xlwt,other writers support unicode natively.
  • - inf_rep : string, default ‘inf' Representation for infinity (there is no native representation for infinity in Excel)
  • - freeze_panes : tuple of integer (length 2), default None Specifies the one-based bottommost row and rightmost column that is to be frozen

数据源:

  ID NUM-1  NUM-2  NUM-30  36901  142 168 6611  36902  78 521 6022  36903  144 600 5213  36904  95 457 4684  36905  69 596 6955  36906  165 453 加载数据:basestation ="F://python/data/test.xls"basestation_end ="F://python/data/test_end.xls"data = pd.read_excel(basestation)

(1)参数excel_writer,输出路径。

data.to_excel(basestation_end)输出:  ID NUM-1  NUM-2  NUM-30  36901  142 168 6611  36902  78 521 6022  36903  144 600 5213  36904  95 457 4684  36905  69 596 6955  36906  165 453 

(2)sheet_name,将数据存储在excel的那个sheet页面。

data.to_excel(basestation_end,sheet_name="sheet2")

(3)na_rep,缺失值填充

data.to_excel(basestation_end,na_rep="NULL")输出:  ID NUM-1  NUM-2  NUM-30  36901  142 168 6611  36902  78 521 6022  36903  144 600 5213  36904  95 457 4684  36905  69 596 6955  36906  165 453 NULL

(4) colums参数: sequence, optional,Columns to write 选择输出的的列。

data.to_excel(basestation_end,columns=["ID"])输出  ID0  369011  369022  369033  369044  369055  36906

(5)header 参数: boolean or list of string,默认为True,可以用list命名列的名字。header = False 则不输出题头。

data.to_excel(basestation_end,header=["a","b","c","d"])输出:  a  b  c  d0  36901  142 168 6611  36902  78 521 6022  36903  144 600 5213  36904  95 457 4684  36905  69 596 6955  36906  165 453 data.to_excel(basestation_end,header=False,columns=["ID"])header = False 则不输出题头输出:0  369011  369022  369033  369044  369055  36906

(6)index : boolean, default True Write row names (index)

默认为True,显示index,当index=False 则不显示行索引(名字)。

index_label : string or sequence, default None

设置索引列的列名。

data.to_excel(basestation_end,index=False)输出:ID NUM-1  NUM-2  NUM-336901  142 168 66136902  78 521 60236903  144 600 52136904  95 457 46836905  69 596 69536906  165 453 data.to_excel(basestation_end,index_label=["f"])输出:f  ID NUM-1  NUM-2  NUM-30  36901  142 168 6611  36902  78 521 6022  36903  144 600 5213  36904  95 457 4684  36905  69 596 6955  36906  165 453 

以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持。


  • 上一条:
    postman传递当前时间戳实例详解
    下一条:
    opencv调整图像亮度对比度的示例代码
  • 昵称:

    邮箱:

    0条评论 (评论内容有缓存机制,请悉知!)
    最新最热
    • 分类目录
    • 人生(杂谈)
    • 技术
    • linux
    • Java
    • php
    • 框架(架构)
    • 前端
    • ThinkPHP
    • 数据库
    • 微信(小程序)
    • Laravel
    • Redis
    • Docker
    • Go
    • swoole
    • Windows
    • Python
    • 苹果(mac/ios)
    • 相关文章
    • gmail发邮件报错:534 5.7.9 Application-specific password required...解决方案(0个评论)
    • 2024.07.09日OpenAI将终止对中国等国家和地区API服务(0个评论)
    • 2024/6/9最新免费公益节点SSR/V2ray/Shadowrocket/Clash节点分享|科学上网|免费梯子(1个评论)
    • 国外服务器实现api.openai.com反代nginx配置(0个评论)
    • 2024/4/28最新免费公益节点SSR/V2ray/Shadowrocket/Clash节点分享|科学上网|免费梯子(1个评论)
    • 近期文章
    • 在go中实现一个常用的先进先出的缓存淘汰算法示例代码(0个评论)
    • 在go+gin中使用"github.com/skip2/go-qrcode"实现url转二维码功能(0个评论)
    • 在go语言中使用api.geonames.org接口实现根据国际邮政编码获取地址信息功能(1个评论)
    • 在go语言中使用github.com/signintech/gopdf实现生成pdf分页文件功能(0个评论)
    • gmail发邮件报错:534 5.7.9 Application-specific password required...解决方案(0个评论)
    • 欧盟关于强迫劳动的规定的官方举报渠道及官方举报网站(0个评论)
    • 在go语言中使用github.com/signintech/gopdf实现生成pdf文件功能(0个评论)
    • Laravel从Accel获得5700万美元A轮融资(0个评论)
    • 在go + gin中gorm实现指定搜索/区间搜索分页列表功能接口实例(0个评论)
    • 在go语言中实现IP/CIDR的ip和netmask互转及IP段形式互转及ip是否存在IP/CIDR(0个评论)
    • 近期评论
    • 122 在

      学历:一种延缓就业设计,生活需求下的权衡之选中评论 工作几年后,报名考研了,到现在还没认真学习备考,迷茫中。作为一名北漂互联网打工人..
    • 123 在

      Clash for Windows作者删库跑路了,github已404中评论 按理说只要你在国内,所有的流量进出都在监控范围内,不管你怎么隐藏也没用,想搞你分..
    • 原梓番博客 在

      在Laravel框架中使用模型Model分表最简单的方法中评论 好久好久都没看友情链接申请了,今天刚看,已经添加。..
    • 博主 在

      佛跳墙vpn软件不会用?上不了网?佛跳墙vpn常见问题以及解决办法中评论 @1111老铁这个不行了,可以看看近期评论的其他文章..
    • 1111 在

      佛跳墙vpn软件不会用?上不了网?佛跳墙vpn常见问题以及解决办法中评论 网站不能打开,博主百忙中能否发个APP下载链接,佛跳墙或极光..
    • 2016-10
    • 2016-11
    • 2017-07
    • 2017-08
    • 2017-09
    • 2018-01
    • 2018-07
    • 2018-08
    • 2018-09
    • 2018-12
    • 2019-01
    • 2019-02
    • 2019-03
    • 2019-04
    • 2019-05
    • 2019-06
    • 2019-07
    • 2019-08
    • 2019-09
    • 2019-10
    • 2019-11
    • 2019-12
    • 2020-01
    • 2020-03
    • 2020-04
    • 2020-05
    • 2020-06
    • 2020-07
    • 2020-08
    • 2020-09
    • 2020-10
    • 2020-11
    • 2021-04
    • 2021-05
    • 2021-06
    • 2021-07
    • 2021-08
    • 2021-09
    • 2021-10
    • 2021-12
    • 2022-01
    • 2022-02
    • 2022-03
    • 2022-04
    • 2022-05
    • 2022-06
    • 2022-07
    • 2022-08
    • 2022-09
    • 2022-10
    • 2022-11
    • 2022-12
    • 2023-01
    • 2023-02
    • 2023-03
    • 2023-04
    • 2023-05
    • 2023-06
    • 2023-07
    • 2023-08
    • 2023-09
    • 2023-10
    • 2023-12
    • 2024-02
    • 2024-04
    • 2024-05
    • 2024-06
    • 2025-02
    Top

    Copyright·© 2019 侯体宗版权所有· 粤ICP备20027696号 PHP交流群

    侯体宗的博客