侯体宗的博客
  • 首页
  • Hyperf版
  • beego仿版
  • 人生(杂谈)
  • 技术
  • 关于我
  • 更多分类
    • 文件下载
    • 文字修仙
    • 中国象棋ai
    • 群聊
    • 九宫格抽奖
    • 拼图
    • 消消乐
    • 相册

libreoffice python 操作word及excel文档的方法

Python  /  管理员 发布于 7年前   144

1、开始、关闭libreoffice服务;

开始之前同步字体文件时间,是因为创建soffice服务时,服务会检查所需加载的文件的时间,如果其认为时间不符,则其可能会重新加载,耗时较长,因此需事先统一时间。

使用时如果需要多次调用,最后每次调用均开启后关闭,否则libreoffice会创建一个缓存文档并越用越大,处理时间会增加。

class OfficeProcess(object):  def __init__(self):    self.p = 0    subprocess.Popen('find /usr/share/fonts | xargs touch -m -t 201801010000.00', shell=True)  def start_office(self):    self.p = subprocess.Popen('soffice --pidfile=sof.pid --invisible --accept="socket,host=localhost,port=2002;urp;"', shell=True)    while True:      try:        local_context = uno.getComponentContext()        resolver = local_context.getServiceManager().createInstanceWithContext('com.sun.star.bridge.UnoUrlResolver', local_context)        resolver.resolve('uno:socket,host=localhost,port=2002;urp;StarOffice.ComponentContext')        return      except:        print(ts(), "wait for connecting soffice...")        time.sleep(1)        continue  def stop_office(self):    with open("sof.pid", "rb") as f:      try:        os.kill(int(f.read()), signal.SIGTERM)        self.p.wait()      except:        pass

2、init service manager

local_context = uno.getComponentContext()    service_manager = local_context.getServiceManager()    resolver = service_manager.createInstanceWithContext('com.sun.star.bridge.UnoUrlResolver', local_context)    self.ctx = resolver.resolve('uno:socket,host=localhost,port=2002;urp;StarOffice.ComponentContext')    self.smgr = self.ctx.ServiceManager    self.desktop = self.smgr.createInstanceWithContext('com.sun.star.frame.Desktop', self.ctx)

3、从二进制数据中读取doc文档

def ImportFromMemory(self, data):    istream = self.smgr.createInstanceWithContext('com.sun.star.io.SequenceInputStream', self.ctx)    istream.initialize((uno.ByteSequence(data), ))    pv = PropertyValue()    pv.Name = 'InputStream'    pv.Value = istream    self.doc = {'doc': []}    try:      self.document = self.desktop.loadComponentFromURL('private:stream/swriter', '_blank', 0, (pv, ))      self.text = self.document.getText()    except:      self.text = None

4、读取doc文档中的数据

def ExportToJson(self):    try:      l = self.__ParseText(self.text, self.__Callback(self.doc['doc']))      self.doc['length'] = l    except:      self.doc = {'doc': [], 'length': 0}    return json.dumps(self.doc)@staticmethod  def __Callback(alist):    def Append(sth):      alist.append(sth)    return Append
def __ParseText(self, text, func):    l = 0    text_it = text.createEnumeration()    while text_it.hasMoreElements():      element = text_it.nextElement()      if element.supportsService('com.sun.star.text.Paragraph'):        l += self.__ParseParagraph(element, func)      elif element.supportsService('com.sun.star.text.TextTable'):        l += self.__ParseTable(element, func)      else:        pass    return l
def __ParseParagraph(self, paragraph, func):    p = {'paragraph': []}    l = 0    paragraph_it = paragraph.createEnumeration()    while paragraph_it.hasMoreElements():      portion = paragraph_it.nextElement()      if portion.TextPortionType == 'Text':        l += self.__ParsePortionText(portion, self.__Callback(p['paragraph']))      elif portion.TextPortionType == 'SoftPageBreak':        pass      elif portion.TextPortionType == 'TextField':        l += self.__ParsePortionText(portion, self.__Callback(p['paragraph']))      else:        l += self.__ParseTextContent(portion, self.__Callback(p['paragraph']))    if hasattr(paragraph, 'createContentEnumeration'):      l += self.__ParseTextContent(paragraph, self.__Callback(p['paragraph']))    p['length'] = l    func(p)    return l  def __ParseTextContent(self, textcontent, func):    l = 0    content_it = textcontent.createContentEnumeration('com.sun.star.text.TextContent')    while content_it.hasMoreElements():      element = content_it.nextElement()      if element.supportsService('com.sun.star.text.TextGraphicObject'):        l += self.__ParsePortionGraphic(element, func)      elif element.supportsService('com.sun.star.text.TextEmbeddedObject'):        pass      elif element.supportsService('com.sun.star.text.TextFrame'):        l += self.__ParseFrame(element, func)      elif element.supportsService('com.sun.star.drawing.GroupShape'):        l += self.__ParseGroup(element, func)      else:        pass    return l  def __ParseFrame(self, frame, func):    f = {'frame': []}    l = self.__ParseText(frame.getText(), self.__Callback(f['frame']))    f['length'] = l    func(f)    return l  def __ParseGroup(self, group, func):    l = 0    for i in range(group.getCount()):      it = group.getByIndex(i)      if it.supportsService('com.sun.star.drawing.Text'):        l += self.__ParseFrame(it, func)      else:        pass    return l  def __ParsePortionText(self, portion_text, func):    func({'portion': portion_text.String, 'length': len(portion_text.String)})    return len(portion_text.String)  def __ParsePortionGraphic(self, portion_graphic, func):    gp = self.smgr.createInstanceWithContext('com.sun.star.graphic.GraphicProvider', self.ctx)    stream = self.smgr.createInstanceWithContext('com.sun.star.io.TempFile', self.ctx)    pv1 = PropertyValue()    pv1.Name = 'OutputStream'    pv1.Value = stream    pv2 = PropertyValue()    pv2.Name = 'MimeType'    pv2.Value = 'image/png'    gp.storeGraphic(portion_graphic.Graphic, (pv1, pv2))    stream.getOutputStream().flush()    stream.seek(0)    l = stream.getInputStream().available()    b = uno.ByteSequence(b'')    stream.seek(0)    l, b = stream.getInputStream().readBytes(b, l)    img = {'image': base64.b64encode(b.value).decode('ascii')}    img['height'] = portion_graphic.Height    img['width'] = portion_graphic.Width    img['actualheight'] = portion_graphic.ActualSize.Height    img['actualwidth'] = portion_graphic.ActualSize.Width    img['croptop'] = portion_graphic.GraphicCrop.Top    img['cropbottom'] = portion_graphic.GraphicCrop.Bottom    img['cropleft'] = portion_graphic.GraphicCrop.Left    img['cropright'] = portion_graphic.GraphicCrop.Right    img['length'] = 0    func(img)    return 0  def __ParseTable(self, table, func):    l = 0    try:      matrix = self.__GetTableMatrix(table)      seps = self.__GetTableSeparators(table)      t = {}      count = 0      for ri in matrix.keys():        t[ri] = {}        for ci in matrix[ri].keys():          t[ri][ci] = dict(matrix[ri][ci])          del t[ri][ci]['cell']          t[ri][ci]['content'] = []          l += self.__ParseText(matrix[ri][ci]['cell'], self.__Callback(t[ri][ci]['content']))          count += t[ri][ci]['rowspan'] * t[ri][ci]['colspan']      if count != len(t) * len(seps):        raise ValueError('count of cells error')      func({'table': t, 'row': len(t), 'column': len(seps), 'length': l, 'tableid': self.table_id})      self.table_id += 1    except:      l = 0      print('discard wrong table')    return l  @staticmethod  def __GetTableSeparators(table):    result = [table.TableColumnRelativeSum]    for ri in range(table.getRows().getCount()):      result += [s.Position for s in table.getRows().getByIndex(ri).TableColumnSeparators]    result = sorted(set(result))    for i in range(len(result) - 1):      result[i] += 1 if result[i] + 1 == result[i + 1] else 0    return sorted(set(result))  @staticmethod  def __NameToRC(name):    r = int(re.sub('[A-Za-z]', '', name)) - 1    cstr = re.sub('[0-9]', '', name)    c = 0    for i in range(len(cstr)):      if cstr[i] >= 'A' and cstr[i] <= 'Z':        c = c * 52 + ord(cstr[i]) - ord('A')      else:        c = c * 52 + 26 + ord(cstr[i]) - ord('a')    return r, c  @staticmethod  def __GetTableMatrix(table):    result = {}    for name in table.getCellNames():      ri, ci = WordToJson.__NameToRC(name)      cell = table.getCellByName(name)      if ri not in result:        result[ri] = {}      result[ri][ci] = {'cell': cell, 'rowspan': cell.RowSpan, 'name': name}    seps = WordToJson.__GetTableSeparators(table)    for ri in result.keys():      sep = [s.Position for s in table.getRows().getByIndex(ri).TableColumnSeparators] + [table.TableColumnRelativeSum]      sep = sorted(set(sep))      for ci in result[ri].keys():        right = seps.index(sep[ci]) if sep[ci] in seps else seps.index(sep[ci] + 1)        left = -1 if ci == 0 else seps.index(sep[ci - 1]) if sep[ci - 1] in seps else seps.index(sep[ci - 1] + 1)        result[ri][ci]['colspan'] = right - left    return result

5、写doc文档

self.doco = self.desktop.loadComponentFromURL('private:factory/swriter', '_blank', 0, ())    self.texto = self.doco.getText()    self.cursoro = self.texto.createTextCursor()    self.cursoro.ParaBottomMargin = 500
def __WriteText(self, text, texto, cursoro):    for it in text:      if 'paragraph' in it:        self.__WriteParagraph(it, texto, cursoro)      elif 'image' in it:        self.__WritePortionGraphic(it, texto, cursoro)      elif 'table' in it:        self.__WriteTable(it, texto, cursoro)  def __WriteParagraph(self, paragraph, texto, cursoro):    if paragraph['length'] > 0:      if 'result' in paragraph:        for it in paragraph['result']:          texto.insertString(cursoro, it['trans_sen'], False)      else:        texto.insertString(cursoro, paragraph['paragraph'], False)      texto.insertControlCharacter(cursoro, ControlCharacter.PARAGRAPH_BREAK, False)  def __WritePortionGraphic(self, portion_graphic, texto, cursoro):    png_base64 = portion_graphic['image']    png = base64.b64decode(png_base64)    gp = self.smgr.createInstanceWithContext('com.sun.star.graphic.GraphicProvider', self.ctx)    istream = self.smgr.createInstanceWithContext('com.sun.star.io.SequenceInputStream', self.ctx)    istream.initialize((uno.ByteSequence(png), ))    pv = PropertyValue()    pv.Name = 'InputStream'    pv.Value = istream    actualsize = uno.createUnoStruct('com.sun.star.awt.Size')    actualsize.Height = portion_graphic['actualheight'] if 'actualheight' in portion_graphic else portion_graphic['height']    actualsize.Width = portion_graphic['actualwidth'] if 'actualwidth' in portion_graphic else portion_graphic['width']    graphiccrop = uno.createUnoStruct('com.sun.star.text.GraphicCrop')    graphiccrop.Top = portion_graphic['croptop'] if 'croptop' in portion_graphic else 0    graphiccrop.Bottom = portion_graphic['cropbottom'] if 'cropbottom' in portion_graphic else 0    graphiccrop.Left = portion_graphic['cropleft'] if 'cropleft' in portion_graphic else 0    graphiccrop.Right = portion_graphic['cropright'] if 'cropright' in portion_graphic else 0    image = self.doco.createInstance('com.sun.star.text.TextGraphicObject')    image.Surround = NONE    image.Graphic = gp.queryGraphic((pv, ))    image.Height = portion_graphic['height']    image.Width = portion_graphic['width']    image.setPropertyValue('ActualSize', actualsize)    image.setPropertyValue('GraphicCrop', graphiccrop)    texto.insertTextContent(cursoro, image, False)    texto.insertControlCharacter(cursoro, ControlCharacter.PARAGRAPH_BREAK, False)  def __WriteTable(self, table, texto, cursoro):    tableo = self.doco.createInstance('com.sun.star.text.TextTable')    tableo.initialize(table['row'], table['column'])    texto.insertTextContent(cursoro, tableo, False)#    texto.insertControlCharacter(cursoro, ControlCharacter.PARAGRAPH_BREAK, False)    tcursoro = tableo.createCursorByCellName("A1")    hitbug = False    if table['row'] > 1:      tcursoro.goDown(1, True)      hitbug = tcursoro.getRangeName() == 'A1'    for ri in sorted([int(r) for r in table['table'].keys()]):      rs = table['table'][str(ri)]      for ci in sorted([int(c) for c in rs.keys()]):        cell = rs[str(ci)]        if hitbug == False and (cell['rowspan'] > 1 or cell['colspan'] > 1):          tcursoro.gotoCellByName(cell['name'], False)          if cell['rowspan'] > 1:tcursoro.goDown(cell['rowspan'] - 1, True)          if cell['colspan'] > 1:tcursoro.goRight(cell['colspan'] - 1, True)          tcursoro.mergeRange()        ctexto = tableo.getCellByName(cell['name'])        if ctexto == None:          continue        ccursoro = ctexto.createTextCursor()        ccursoro.CharWeight = FontWeight.NORMAL        ccursoro.CharWeightAsian = FontWeight.NORMAL        ccursoro.ParaAdjust = LEFT        self.__WriteText(cell['content'], ctexto, ccursoro)

6、生成二进制的doc文档数据

    streamo = self.smgr.createInstanceWithContext('com.sun.star.io.Pipe', self.ctx)    self.doco.storeToURL('private:stream', (PropertyValue('FilterName', 0, 'MS Word 2007 XML', 0), PropertyValue('OutputStream', 0, streamo, 0)))    streamo.flush()    _, datao = streamo.readBytes(None, streamo.available())

7、从doc文档数据生成pdf的二进制数据

    streamo = self.smgr.createInstanceWithContext('com.sun.star.io.Pipe', self.ctx)    self.doco.storeToURL('private:stream', (PropertyValue('FilterName', 0, 'writer_pdf_Export', 0), PropertyValue('OutputStream', 0, streamo, 0)))    streamo.flush()    _, datap = streamo.readBytes(None, streamo.available())

8、读取excel二进制数据

  def ImportFromMemory(self, data):    istream = self.smgr.createInstanceWithContext('com.sun.star.io.SequenceInputStream', self.ctx)    istream.initialize((uno.ByteSequence(data), ))    pv = PropertyValue()    pv.Name = 'InputStream'    pv.Value = istream    self.doc = {'doc': []}    try:      print("before loadComponentFromURL")      self.document = self.desktop.loadComponentFromURL('private:stream/scalc', '_blank', 0, (pv, ))      self.sheets = self.document.getSheets()      print("ImportFromMemory done")    except:      print("ImportFromMemory failed")      self.sheets = None

9、读取excel的文本数据

  def ExportToJson(self):    try:      l = self.__ParseText(self.sheets, self.__Callback(self.doc['doc']))      self.doc['length'] = l    except:      self.doc = {'doc': [], 'length': 0}    return json.dumps(self.doc)
  def __ParseText(self, sheets, func):    l = 0    sheets_it = sheets.createEnumeration()    while sheets_it.hasMoreElements():      element = sheets_it.nextElement()      if element.supportsService('com.sun.star.sheet.Spreadsheet'):        l += self.__ParseSpreadsheet(element, func)    return l  def __ParseSpreadsheet(self, spreadsheet, func):    l = 0    p = {'spreadsheet': []}    visible_cells_it = spreadsheet.queryVisibleCells().getCells().createEnumeration()    while visible_cells_it.hasMoreElements():      cell = visible_cells_it.nextElement()      type = cell.getType()      if type == self.EMPTY:        print("cell.type==empty")      elif type == self.VALUE:        print("cell.type==VALUE", "value=", cell.getValue(), cell.getCellAddress ())      elif type == self.TEXT:        print("cell.type==TEXT","content=", cell.getString().encode("UTF-8"), cell.getCellAddress ())        l += self.__ParseCellText(spreadsheet, cell, self.__Callback(p['spreadsheet']))        print("__ParseCellText=", p)      elif type == self.FORMULA:        print("cell.type==FORMULA", "formula=", cell.getValue())    p['length'] = l    func(p)    return l  def __ParseCellText(self, sheet, cell, func):    try:      x = cell.getCellAddress().Column      y = cell.getCellAddress().Row      sheetname = sheet.getName()    except:      x = -1      y = -1      sheetname = None    func({'celltext': cell.getString(), 'x': x, 'y': y, 'sheetname': sheetname, 'length': len(cell.getString())})    return len(cell.getString())
     self.EMPTY = uno.Enum("com.sun.star.table.CellContentType", "EMPTY")    self.TEXT = uno.Enum("com.sun.star.table.CellContentType", "TEXT")    self.FORMULA = uno.Enum("com.sun.star.table.CellContentType", "FORMULA")    self.VALUE = uno.Enum("com.sun.star.table.CellContentType", "VALUE")

10、替换excel的文本信息

  def ImportFromJson(self, data):    doc = json.loads(data)    try:      self.__WriteText(doc['doc'])    except:      pass
def __WriteText(self, text):    print("__WriteText begin:", text)    sheet = None    for it in text:      if 'paragraph' in it and 'sheetname' in it:        if sheet == None or sheet.getName() != it['sheetname']:          try:sheet = self.sheets.getByName(it['sheetname'])print("getsheet:", it['sheetname'], "=", sheet.getName())          except:sheet = Nonecontinue        self.__WriteParagraph(it, sheet)  def __WriteParagraph(self, paragraph, sheet):    print("__WriteParagraph")    if paragraph['length'] > 0:      try:        x = paragraph['x']        y = paragraph['y']        print("getcell:", x, y)        cell = sheet.getCellByPosition(x, y)        print("getcell done")      except:        return      if 'result' in paragraph:        for it in paragraph['result']:          print("cell=", cell.getString())          cell.setString(it['trans_sen'])          print("cell,", cell.getString(), ",done")

11、生成excel文档二进制数据

     streamo = self.smgr.createInstanceWithContext('com.sun.star.io.Pipe', self.ctx)    self.document.storeToURL('private:stream', (PropertyValue('FilterName', 0, 'Calc MS Excel 2007 XML', 0), PropertyValue('OutputStream', 0, streamo, 0)))    streamo.flush()    _, datao = streamo.readBytes(None, streamo.available())

12、生成excel的pdf文档

    streamo = self.smgr.createInstanceWithContext('com.sun.star.io.Pipe', self.ctx)    self.document.storeToURL('private:stream', (PropertyValue('FilterName', 0, 'calc_pdf_Export', 0), PropertyValue('OutputStream', 0, streamo, 0)))    streamo.flush()    _, datap = streamo.readBytes(None, streamo.available())

以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持。


  • 上一条:
    Python 图像处理: 生成二维高斯分布蒙版的实例
    下一条:
    Python实现12306火车票抢票系统
  • 昵称:

    邮箱:

    0条评论 (评论内容有缓存机制,请悉知!)
    最新最热
    • 分类目录
    • 人生(杂谈)
    • 技术
    • linux
    • Java
    • php
    • 框架(架构)
    • 前端
    • ThinkPHP
    • 数据库
    • 微信(小程序)
    • Laravel
    • Redis
    • Docker
    • Go
    • swoole
    • Windows
    • Python
    • 苹果(mac/ios)
    • 相关文章
    • 在python语言中Flask框架的学习及简单功能示例(0个评论)
    • 在Python语言中实现GUI全屏倒计时代码示例(0个评论)
    • Python + zipfile库实现zip文件解压自动化脚本示例(0个评论)
    • python爬虫BeautifulSoup快速抓取网站图片(1个评论)
    • vscode 配置 python3开发环境的方法(0个评论)
    • 近期文章
    • 智能合约Solidity学习CryptoZombie第四课:僵尸作战系统(0个评论)
    • 智能合约Solidity学习CryptoZombie第三课:组建僵尸军队(高级Solidity理论)(0个评论)
    • 智能合约Solidity学习CryptoZombie第二课:让你的僵尸猎食(0个评论)
    • 智能合约Solidity学习CryptoZombie第一课:生成一只你的僵尸(0个评论)
    • 在go中实现一个常用的先进先出的缓存淘汰算法示例代码(0个评论)
    • 在go+gin中使用"github.com/skip2/go-qrcode"实现url转二维码功能(0个评论)
    • 在go语言中使用api.geonames.org接口实现根据国际邮政编码获取地址信息功能(1个评论)
    • 在go语言中使用github.com/signintech/gopdf实现生成pdf分页文件功能(0个评论)
    • gmail发邮件报错:534 5.7.9 Application-specific password required...解决方案(0个评论)
    • 欧盟关于强迫劳动的规定的官方举报渠道及官方举报网站(0个评论)
    • 近期评论
    • 122 在

      学历:一种延缓就业设计,生活需求下的权衡之选中评论 工作几年后,报名考研了,到现在还没认真学习备考,迷茫中。作为一名北漂互联网打工人..
    • 123 在

      Clash for Windows作者删库跑路了,github已404中评论 按理说只要你在国内,所有的流量进出都在监控范围内,不管你怎么隐藏也没用,想搞你分..
    • 原梓番博客 在

      在Laravel框架中使用模型Model分表最简单的方法中评论 好久好久都没看友情链接申请了,今天刚看,已经添加。..
    • 博主 在

      佛跳墙vpn软件不会用?上不了网?佛跳墙vpn常见问题以及解决办法中评论 @1111老铁这个不行了,可以看看近期评论的其他文章..
    • 1111 在

      佛跳墙vpn软件不会用?上不了网?佛跳墙vpn常见问题以及解决办法中评论 网站不能打开,博主百忙中能否发个APP下载链接,佛跳墙或极光..
    • 2016-10
    • 2016-11
    • 2018-04
    • 2020-03
    • 2020-04
    • 2020-05
    • 2020-06
    • 2022-01
    • 2023-07
    • 2023-10
    Top

    Copyright·© 2019 侯体宗版权所有· 粤ICP备20027696号 PHP交流群

    侯体宗的博客