Ewig
V2EX  ›  问与答

Python pdf 处理

  •  
  •   Ewig · Jan 14, 2019 · 1347 views
    This topic created in 2702 days ago, the information mentioned may be changed or developed.
    我在处理 pdf 里面的一些数据 报错如下,下面的意思我需要改这个软件的源码 int 为 byte ?


    读取的文件为 /home/shenjianlin/pdf_file/qimingpian_pdf/无线医疗白皮书-12 页.pdf
    Traceback (most recent call last):
    File "remove_water_mark.py", line 90, in <module>
    remove_water_mark().read_content()
    File "remove_water_mark.py", line 52, in read_content
    for i in range(0, pdf.getNumPages()):
    File "/usr/lib64/python3.4/site-packages/PyPDF2/pdf.py", line 1155, in getNumPages
    self._flatten()
    File "/usr/lib64/python3.4/site-packages/PyPDF2/pdf.py", line 1505, in _flatten
    catalog = self.trailer["/Root"].getObject()
    File "/usr/lib64/python3.4/site-packages/PyPDF2/generic.py", line 511, in __getitem__
    return dict.__getitem__(self, key).getObject()
    File "/usr/lib64/python3.4/site-packages/PyPDF2/generic.py", line 178, in getObject
    return self.pdf.getObject(self).getObject()
    File "/usr/lib64/python3.4/site-packages/PyPDF2/pdf.py", line 1599, in getObject
    idnum, generation = self.readObjectHeader(self.stream)
    File "/usr/lib64/python3.4/site-packages/PyPDF2/pdf.py", line 1667, in readObjectHeader
    return int(idnum), int(generation)
    ValueError: invalid literal for int() with base 10: b'bj'
    Supplement 1  ·  Jan 14, 2019
    import PyPDF2
    for i in range(0, pdf.getNumPages()):
    if i < 3:
    Num_page_content = pdf.getPage(i)
    print(Num_page_content)
    if Num_page_content.get('/Resources'):
    page_resource = Num_page_content['/Resources']
    if page_resource.get('/XObject'):
    xobject = page_resource['/XObject']
    form = None
    for item in xobject:
    if item.startswith('/FormXob'):
    if not flag:
    flag = True
    form = item
    if form:
    print('remove water mark in page: {}'.format(i))
    xobject.pop(form)
    pdf_output.addPage(Num_page_content)
    else:
    pdf_output.addPage(pdf.getPage(i))
    No Comments Yet
    About   ·   Help   ·   Advertise   ·   Blog   ·   API   ·   FAQ   ·   Solana   ·   3223 Online   Highest 6679   ·     Select Language
    创意工作者们的社区
    World is powered by solitude
    VERSION: 3.9.8.5 · 39ms · UTC 12:36 · PVG 20:36 · LAX 05:36 · JFK 08:36
    ♥ Do have faith in what you're doing.