First commit
Original PyPDF code. Updates should be coming from Noah soon.
This commit is contained in:
commit
c59a212a4c
|
@ -0,0 +1,2 @@
|
||||||
|
*.pyc
|
||||||
|
*.swp
|
|
@ -0,0 +1,205 @@
|
||||||
|
Version 1.12, 2008-09-02
|
||||||
|
------------------------
|
||||||
|
|
||||||
|
- Added support for XMP metadata.
|
||||||
|
|
||||||
|
- Fix reading files with xref streams with multiple /Index values.
|
||||||
|
|
||||||
|
- Fix extracting content streams that use graphics operators longer than 2
|
||||||
|
characters. Affects merging PDF files.
|
||||||
|
|
||||||
|
|
||||||
|
Version 1.11, 2008-05-09
|
||||||
|
------------------------
|
||||||
|
|
||||||
|
- Patch from Hartmut Goebel to permit RectangleObjects to accept NumberObject
|
||||||
|
or FloatObject values.
|
||||||
|
|
||||||
|
- PDF compatibility fixes.
|
||||||
|
|
||||||
|
- Fix to read object xref stream in correct order.
|
||||||
|
|
||||||
|
- Fix for comments inside content streams.
|
||||||
|
|
||||||
|
|
||||||
|
Version 1.10, 2007-10-04
|
||||||
|
------------------------
|
||||||
|
|
||||||
|
- Text strings from PDF files are returned as Unicode string objects when
|
||||||
|
pyPdf determines that they can be decoded (as UTF-16 strings, or as
|
||||||
|
PDFDocEncoding strings). Unicode objects are also written out when
|
||||||
|
necessary. This means that string objects in pyPdf can be either
|
||||||
|
generic.ByteStringObject instances, or generic.TextStringObject instances.
|
||||||
|
|
||||||
|
- The extractText method now returns a unicode string object.
|
||||||
|
|
||||||
|
- All document information properties now return unicode string objects. In
|
||||||
|
the event that a document provides docinfo properties that are not decoded by
|
||||||
|
pyPdf, the raw byte strings can be accessed with an "_raw" property (ie.
|
||||||
|
title_raw rather than title)
|
||||||
|
|
||||||
|
- generic.DictionaryObject instances have been enhanced to be easier to use.
|
||||||
|
Values coming out of dictionary objects will automatically be de-referenced
|
||||||
|
(.getObject will be called on them), unless accessed by the new "raw_get"
|
||||||
|
method. DictionaryObjects can now only contain PdfObject instances (as keys
|
||||||
|
and values), making it easier to debug where non-PdfObject values (which
|
||||||
|
cannot be written out) are entering dictionaries.
|
||||||
|
|
||||||
|
- Support for reading named destinations and outlines in PDF files. Original
|
||||||
|
patch by Ashish Kulkarni.
|
||||||
|
|
||||||
|
- Stream compatibility reading enhancements for malformed PDF files.
|
||||||
|
|
||||||
|
- Cross reference table reading enhancements for malformed PDF files.
|
||||||
|
|
||||||
|
- Encryption documentation.
|
||||||
|
|
||||||
|
- Replace some "assert" statements with error raising.
|
||||||
|
|
||||||
|
- Minor optimizations to FlateDecode algorithm increase speed when using PNG
|
||||||
|
predictors.
|
||||||
|
|
||||||
|
Version 1.9, 2006-12-15
|
||||||
|
-----------------------
|
||||||
|
|
||||||
|
- Fix several serious bugs introduced in version 1.8, caused by a failure to
|
||||||
|
run through our PDF test suite before releasing that version.
|
||||||
|
|
||||||
|
- Fix bug in NullObject reading and writing.
|
||||||
|
|
||||||
|
Version 1.8, 2006-12-14
|
||||||
|
-----------------------
|
||||||
|
|
||||||
|
- Add support for decryption with the standard PDF security handler. This
|
||||||
|
allows for decrypting PDF files given the proper user or owner password.
|
||||||
|
|
||||||
|
- Add support for encryption with the standard PDF security handler.
|
||||||
|
|
||||||
|
- Add new pythondoc documentation.
|
||||||
|
|
||||||
|
- Fix bug in ASCII85 decode that occurs when whitespace exists inside the
|
||||||
|
two terminating characters of the stream.
|
||||||
|
|
||||||
|
Version 1.7, 2006-12-10
|
||||||
|
-----------------------
|
||||||
|
|
||||||
|
- Fix a bug when using a single page object in two PdfFileWriter objects.
|
||||||
|
|
||||||
|
- Adjust PyPDF to be tolerant of whitespace characters that don't belong
|
||||||
|
during a stream object.
|
||||||
|
|
||||||
|
- Add documentInfo property to PdfFileReader.
|
||||||
|
|
||||||
|
- Add numPages property to PdfFileReader.
|
||||||
|
|
||||||
|
- Add pages property to PdfFileReader.
|
||||||
|
|
||||||
|
- Add extractText function to PdfFileReader.
|
||||||
|
|
||||||
|
|
||||||
|
Version 1.6, 2006-06-06
|
||||||
|
-----------------------
|
||||||
|
|
||||||
|
- Add basic support for comments in PDF files. This allows us to read some
|
||||||
|
ReportLab PDFs that could not be read before.
|
||||||
|
|
||||||
|
- Add "auto-repair" for finding xref table at slightly bad locations.
|
||||||
|
|
||||||
|
- New StreamObject backend, cleaner and more powerful. Allows the use of
|
||||||
|
stream filters more easily, including compressed streams.
|
||||||
|
|
||||||
|
- Add a graphics state push/pop around page merges. Improves quality of
|
||||||
|
page merges when one page's content stream leaves the graphics
|
||||||
|
in an abnormal state.
|
||||||
|
|
||||||
|
- Add PageObject.compressContentStreams function, which filters all content
|
||||||
|
streams and compresses them. This will reduce the size of PDF pages,
|
||||||
|
especially after they could have been decompressed in a mergePage
|
||||||
|
operation.
|
||||||
|
|
||||||
|
- Support inline images in PDF content streams.
|
||||||
|
|
||||||
|
- Add support for using .NET framework compression when zlib is not
|
||||||
|
available. This does not make pyPdf compatible with IronPython, but it
|
||||||
|
is a first step.
|
||||||
|
|
||||||
|
- Add support for reading the document information dictionary, and extracting
|
||||||
|
title, author, subject, producer and creator tags.
|
||||||
|
|
||||||
|
- Add patch to support NullObject and multiple xref streams, from Bradley
|
||||||
|
Lawrence.
|
||||||
|
|
||||||
|
|
||||||
|
Version 1.5, 2006-01-28
|
||||||
|
-----------------------
|
||||||
|
|
||||||
|
- Fix a bug where merging pages did not work in "no-rename" cases when the
|
||||||
|
second page has an array of content streams.
|
||||||
|
|
||||||
|
- Remove some debugging output that should not have been present.
|
||||||
|
|
||||||
|
|
||||||
|
Version 1.4, 2006-01-27
|
||||||
|
-----------------------
|
||||||
|
|
||||||
|
- Add capability to merge pages from multiple PDF files into a single page
|
||||||
|
using the PageObject.mergePage function. See example code (README or web
|
||||||
|
site) for more information.
|
||||||
|
|
||||||
|
- Add ability to modify a page's MediaBox, CropBox, BleedBox, TrimBox, and
|
||||||
|
ArtBox properties through PageObject. See example code (README or web site)
|
||||||
|
for more information.
|
||||||
|
|
||||||
|
- Refactor pdf.py into multiple files: generic.py (contains objects like
|
||||||
|
NameObject, DictionaryObject), filters.py (contains filter code),
|
||||||
|
utils.py (various). This does not affect importing PdfFileReader
|
||||||
|
or PdfFileWriter.
|
||||||
|
|
||||||
|
- Add new decoding functions for standard PDF filters ASCIIHexDecode and
|
||||||
|
ASCII85Decode.
|
||||||
|
|
||||||
|
- Change url and download_url to refer to new pybrary.net web site.
|
||||||
|
|
||||||
|
|
||||||
|
Version 1.3, 2006-01-23
|
||||||
|
-----------------------
|
||||||
|
|
||||||
|
- Fix new bug introduced in 1.2 where PDF files with \r line endings did not
|
||||||
|
work properly anymore. A new test suite developed with various PDF files
|
||||||
|
should prevent regression bugs from now on.
|
||||||
|
|
||||||
|
- Fix a bug where inheriting attributes from page nodes did not work.
|
||||||
|
|
||||||
|
|
||||||
|
Version 1.2, 2006-01-23
|
||||||
|
-----------------------
|
||||||
|
|
||||||
|
- Improved support for files with CRLF-based line endings, fixing a common
|
||||||
|
reported problem stating "assertion error: assert line == "%%EOF"".
|
||||||
|
|
||||||
|
- Software author/maintainer is now officially a proud married person, which
|
||||||
|
is sure to result in better software... somehow.
|
||||||
|
|
||||||
|
|
||||||
|
Version 1.1, 2006-01-18
|
||||||
|
-----------------------
|
||||||
|
|
||||||
|
- Add capability to rotate pages.
|
||||||
|
|
||||||
|
- Improved PDF reading support to properly manage inherited attributes from
|
||||||
|
/Type=/Pages nodes. This means that page groups that are rotated or have
|
||||||
|
different media boxes or whatever will now work properly.
|
||||||
|
|
||||||
|
- Added PDF 1.5 support. Namely cross-reference streams and object streams.
|
||||||
|
This release can mangle Adobe's PDFReference16.pdf successfully.
|
||||||
|
|
||||||
|
|
||||||
|
Version 1.0, 2006-01-17
|
||||||
|
-----------------------
|
||||||
|
|
||||||
|
- First distutils-capable true public release. Supports a wide variety of PDF
|
||||||
|
files that I found sitting around on my system.
|
||||||
|
|
||||||
|
- Does not support some PDF 1.5 features, such as object streams,
|
||||||
|
cross-reference streams.
|
||||||
|
|
|
@ -0,0 +1,28 @@
|
||||||
|
Copyright (c) 2006-2008, Mathieu Fenniak
|
||||||
|
Some contributions copyright (c) 2007, Ashish Kulkarni <kulkarni.ashish@gmail.com>
|
||||||
|
|
||||||
|
All rights reserved.
|
||||||
|
|
||||||
|
Redistribution and use in source and binary forms, with or without
|
||||||
|
modification, are permitted provided that the following conditions are
|
||||||
|
met:
|
||||||
|
|
||||||
|
* Redistributions of source code must retain the above copyright notice,
|
||||||
|
this list of conditions and the following disclaimer.
|
||||||
|
* Redistributions in binary form must reproduce the above copyright notice,
|
||||||
|
this list of conditions and the following disclaimer in the documentation
|
||||||
|
and/or other materials provided with the distribution.
|
||||||
|
* The name of the author may not be used to endorse or promote products
|
||||||
|
derived from this software without specific prior written permission.
|
||||||
|
|
||||||
|
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
|
||||||
|
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
||||||
|
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
|
||||||
|
ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
|
||||||
|
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
|
||||||
|
CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
|
||||||
|
SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
|
||||||
|
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
|
||||||
|
CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
|
||||||
|
ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
|
||||||
|
POSSIBILITY OF SUCH DAMAGE.
|
|
@ -0,0 +1 @@
|
||||||
|
include CHANGELOG
|
|
@ -0,0 +1,4 @@
|
||||||
|
from pdf import PdfFileReader, PdfFileWriter
|
||||||
|
from merger import PdfFileMerger
|
||||||
|
|
||||||
|
__all__ = ["pdf", "PdfFileMerger"]
|
|
@ -0,0 +1,252 @@
|
||||||
|
# vim: sw=4:expandtab:foldmethod=marker
|
||||||
|
#
|
||||||
|
# Copyright (c) 2006, Mathieu Fenniak
|
||||||
|
# All rights reserved.
|
||||||
|
#
|
||||||
|
# Redistribution and use in source and binary forms, with or without
|
||||||
|
# modification, are permitted provided that the following conditions are
|
||||||
|
# met:
|
||||||
|
#
|
||||||
|
# * Redistributions of source code must retain the above copyright notice,
|
||||||
|
# this list of conditions and the following disclaimer.
|
||||||
|
# * Redistributions in binary form must reproduce the above copyright notice,
|
||||||
|
# this list of conditions and the following disclaimer in the documentation
|
||||||
|
# and/or other materials provided with the distribution.
|
||||||
|
# * The name of the author may not be used to endorse or promote products
|
||||||
|
# derived from this software without specific prior written permission.
|
||||||
|
#
|
||||||
|
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
|
||||||
|
# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
||||||
|
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
|
||||||
|
# ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
|
||||||
|
# LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
|
||||||
|
# CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
|
||||||
|
# SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
|
||||||
|
# INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
|
||||||
|
# CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
|
||||||
|
# ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
|
||||||
|
# POSSIBILITY OF SUCH DAMAGE.
|
||||||
|
|
||||||
|
|
||||||
|
"""
|
||||||
|
Implementation of stream filters for PDF.
|
||||||
|
"""
|
||||||
|
__author__ = "Mathieu Fenniak"
|
||||||
|
__author_email__ = "biziqe@mathieu.fenniak.net"
|
||||||
|
|
||||||
|
from utils import PdfReadError
|
||||||
|
try:
|
||||||
|
from cStringIO import StringIO
|
||||||
|
except ImportError:
|
||||||
|
from StringIO import StringIO
|
||||||
|
|
||||||
|
try:
|
||||||
|
import zlib
|
||||||
|
def decompress(data):
|
||||||
|
return zlib.decompress(data)
|
||||||
|
def compress(data):
|
||||||
|
return zlib.compress(data)
|
||||||
|
except ImportError:
|
||||||
|
# Unable to import zlib. Attempt to use the System.IO.Compression
|
||||||
|
# library from the .NET framework. (IronPython only)
|
||||||
|
import System
|
||||||
|
from System import IO, Collections, Array
|
||||||
|
def _string_to_bytearr(buf):
|
||||||
|
retval = Array.CreateInstance(System.Byte, len(buf))
|
||||||
|
for i in range(len(buf)):
|
||||||
|
retval[i] = ord(buf[i])
|
||||||
|
return retval
|
||||||
|
def _bytearr_to_string(bytes):
|
||||||
|
retval = ""
|
||||||
|
for i in range(bytes.Length):
|
||||||
|
retval += chr(bytes[i])
|
||||||
|
return retval
|
||||||
|
def _read_bytes(stream):
|
||||||
|
ms = IO.MemoryStream()
|
||||||
|
buf = Array.CreateInstance(System.Byte, 2048)
|
||||||
|
while True:
|
||||||
|
bytes = stream.Read(buf, 0, buf.Length)
|
||||||
|
if bytes == 0:
|
||||||
|
break
|
||||||
|
else:
|
||||||
|
ms.Write(buf, 0, bytes)
|
||||||
|
retval = ms.ToArray()
|
||||||
|
ms.Close()
|
||||||
|
return retval
|
||||||
|
def decompress(data):
|
||||||
|
bytes = _string_to_bytearr(data)
|
||||||
|
ms = IO.MemoryStream()
|
||||||
|
ms.Write(bytes, 0, bytes.Length)
|
||||||
|
ms.Position = 0 # fseek 0
|
||||||
|
gz = IO.Compression.DeflateStream(ms, IO.Compression.CompressionMode.Decompress)
|
||||||
|
bytes = _read_bytes(gz)
|
||||||
|
retval = _bytearr_to_string(bytes)
|
||||||
|
gz.Close()
|
||||||
|
return retval
|
||||||
|
def compress(data):
|
||||||
|
bytes = _string_to_bytearr(data)
|
||||||
|
ms = IO.MemoryStream()
|
||||||
|
gz = IO.Compression.DeflateStream(ms, IO.Compression.CompressionMode.Compress, True)
|
||||||
|
gz.Write(bytes, 0, bytes.Length)
|
||||||
|
gz.Close()
|
||||||
|
ms.Position = 0 # fseek 0
|
||||||
|
bytes = ms.ToArray()
|
||||||
|
retval = _bytearr_to_string(bytes)
|
||||||
|
ms.Close()
|
||||||
|
return retval
|
||||||
|
|
||||||
|
|
||||||
|
class FlateDecode(object):
|
||||||
|
def decode(data, decodeParms):
|
||||||
|
data = decompress(data)
|
||||||
|
predictor = 1
|
||||||
|
if decodeParms:
|
||||||
|
predictor = decodeParms.get("/Predictor", 1)
|
||||||
|
# predictor 1 == no predictor
|
||||||
|
if predictor != 1:
|
||||||
|
columns = decodeParms["/Columns"]
|
||||||
|
# PNG prediction:
|
||||||
|
if predictor >= 10 and predictor <= 15:
|
||||||
|
output = StringIO()
|
||||||
|
# PNG prediction can vary from row to row
|
||||||
|
rowlength = columns + 1
|
||||||
|
assert len(data) % rowlength == 0
|
||||||
|
prev_rowdata = (0,) * rowlength
|
||||||
|
for row in xrange(len(data) / rowlength):
|
||||||
|
rowdata = [ord(x) for x in data[(row*rowlength):((row+1)*rowlength)]]
|
||||||
|
filterByte = rowdata[0]
|
||||||
|
if filterByte == 0:
|
||||||
|
pass
|
||||||
|
elif filterByte == 1:
|
||||||
|
for i in range(2, rowlength):
|
||||||
|
rowdata[i] = (rowdata[i] + rowdata[i-1]) % 256
|
||||||
|
elif filterByte == 2:
|
||||||
|
for i in range(1, rowlength):
|
||||||
|
rowdata[i] = (rowdata[i] + prev_rowdata[i]) % 256
|
||||||
|
else:
|
||||||
|
# unsupported PNG filter
|
||||||
|
raise PdfReadError("Unsupported PNG filter %r" % filterByte)
|
||||||
|
prev_rowdata = rowdata
|
||||||
|
output.write(''.join([chr(x) for x in rowdata[1:]]))
|
||||||
|
data = output.getvalue()
|
||||||
|
else:
|
||||||
|
# unsupported predictor
|
||||||
|
raise PdfReadError("Unsupported flatedecode predictor %r" % predictor)
|
||||||
|
return data
|
||||||
|
decode = staticmethod(decode)
|
||||||
|
|
||||||
|
def encode(data):
|
||||||
|
return compress(data)
|
||||||
|
encode = staticmethod(encode)
|
||||||
|
|
||||||
|
class ASCIIHexDecode(object):
|
||||||
|
def decode(data, decodeParms=None):
|
||||||
|
retval = ""
|
||||||
|
char = ""
|
||||||
|
x = 0
|
||||||
|
while True:
|
||||||
|
c = data[x]
|
||||||
|
if c == ">":
|
||||||
|
break
|
||||||
|
elif c.isspace():
|
||||||
|
x += 1
|
||||||
|
continue
|
||||||
|
char += c
|
||||||
|
if len(char) == 2:
|
||||||
|
retval += chr(int(char, base=16))
|
||||||
|
char = ""
|
||||||
|
x += 1
|
||||||
|
assert char == ""
|
||||||
|
return retval
|
||||||
|
decode = staticmethod(decode)
|
||||||
|
|
||||||
|
class ASCII85Decode(object):
|
||||||
|
def decode(data, decodeParms=None):
|
||||||
|
retval = ""
|
||||||
|
group = []
|
||||||
|
x = 0
|
||||||
|
hitEod = False
|
||||||
|
# remove all whitespace from data
|
||||||
|
data = [y for y in data if not (y in ' \n\r\t')]
|
||||||
|
while not hitEod:
|
||||||
|
c = data[x]
|
||||||
|
if len(retval) == 0 and c == "<" and data[x+1] == "~":
|
||||||
|
x += 2
|
||||||
|
continue
|
||||||
|
#elif c.isspace():
|
||||||
|
# x += 1
|
||||||
|
# continue
|
||||||
|
elif c == 'z':
|
||||||
|
assert len(group) == 0
|
||||||
|
retval += '\x00\x00\x00\x00'
|
||||||
|
continue
|
||||||
|
elif c == "~" and data[x+1] == ">":
|
||||||
|
if len(group) != 0:
|
||||||
|
# cannot have a final group of just 1 char
|
||||||
|
assert len(group) > 1
|
||||||
|
cnt = len(group) - 1
|
||||||
|
group += [ 85, 85, 85 ]
|
||||||
|
hitEod = cnt
|
||||||
|
else:
|
||||||
|
break
|
||||||
|
else:
|
||||||
|
c = ord(c) - 33
|
||||||
|
assert c >= 0 and c < 85
|
||||||
|
group += [ c ]
|
||||||
|
if len(group) >= 5:
|
||||||
|
b = group[0] * (85**4) + \
|
||||||
|
group[1] * (85**3) + \
|
||||||
|
group[2] * (85**2) + \
|
||||||
|
group[3] * 85 + \
|
||||||
|
group[4]
|
||||||
|
assert b < (2**32 - 1)
|
||||||
|
c4 = chr((b >> 0) % 256)
|
||||||
|
c3 = chr((b >> 8) % 256)
|
||||||
|
c2 = chr((b >> 16) % 256)
|
||||||
|
c1 = chr(b >> 24)
|
||||||
|
retval += (c1 + c2 + c3 + c4)
|
||||||
|
if hitEod:
|
||||||
|
retval = retval[:-4+hitEod]
|
||||||
|
group = []
|
||||||
|
x += 1
|
||||||
|
return retval
|
||||||
|
decode = staticmethod(decode)
|
||||||
|
|
||||||
|
def decodeStreamData(stream):
|
||||||
|
from generic import NameObject
|
||||||
|
filters = stream.get("/Filter", ())
|
||||||
|
if len(filters) and not isinstance(filters[0], NameObject):
|
||||||
|
# we have a single filter instance
|
||||||
|
filters = (filters,)
|
||||||
|
data = stream._data
|
||||||
|
for filterType in filters:
|
||||||
|
if filterType == "/FlateDecode":
|
||||||
|
data = FlateDecode.decode(data, stream.get("/DecodeParms"))
|
||||||
|
elif filterType == "/ASCIIHexDecode":
|
||||||
|
data = ASCIIHexDecode.decode(data)
|
||||||
|
elif filterType == "/ASCII85Decode":
|
||||||
|
data = ASCII85Decode.decode(data)
|
||||||
|
elif filterType == "/Crypt":
|
||||||
|
decodeParams = stream.get("/DecodeParams", {})
|
||||||
|
if "/Name" not in decodeParams and "/Type" not in decodeParams:
|
||||||
|
pass
|
||||||
|
else:
|
||||||
|
raise NotImplementedError("/Crypt filter with /Name or /Type not supported yet")
|
||||||
|
else:
|
||||||
|
# unsupported filter
|
||||||
|
raise NotImplementedError("unsupported filter %s" % filterType)
|
||||||
|
return data
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
assert "abc" == ASCIIHexDecode.decode('61\n626\n3>')
|
||||||
|
|
||||||
|
ascii85Test = """
|
||||||
|
<~9jqo^BlbD-BleB1DJ+*+F(f,q/0JhKF<GL>Cj@.4Gp$d7F!,L7@<6@)/0JDEF<G%<+EV:2F!,
|
||||||
|
O<DJ+*.@<*K0@<6L(Df-\\0Ec5e;DffZ(EZee.Bl.9pF"AGXBPCsi+DGm>@3BB/F*&OCAfu2/AKY
|
||||||
|
i(DIb:@FD,*)+C]U=@3BN#EcYf8ATD3s@q?d$AftVqCh[NqF<G:8+EV:.+Cf>-FD5W8ARlolDIa
|
||||||
|
l(DId<j@<?3r@:F%a+D58'ATD4$Bl@l3De:,-DJs`8ARoFb/0JMK@qB4^F!,R<AKZ&-DfTqBG%G
|
||||||
|
>uD.RTpAKYo'+CT/5+Cei#DII?(E,9)oF*2M7/c~>
|
||||||
|
"""
|
||||||
|
ascii85_originalText="Man is distinguished, not only by his reason, but by this singular passion from other animals, which is a lust of the mind, that by a perseverance of delight in the continued and indefatigable generation of knowledge, exceeds the short vehemence of any carnal pleasure."
|
||||||
|
assert ASCII85Decode.decode(ascii85Test) == ascii85_originalText
|
||||||
|
|
File diff suppressed because it is too large
Load Diff
|
@ -0,0 +1,401 @@
|
||||||
|
# vim: sw=4:expandtab:foldmethod=marker
|
||||||
|
#
|
||||||
|
# Copyright (c) 2006, Mathieu Fenniak
|
||||||
|
# All rights reserved.
|
||||||
|
#
|
||||||
|
# Redistribution and use in source and binary forms, with or without
|
||||||
|
# modification, are permitted provided that the following conditions are
|
||||||
|
# met:
|
||||||
|
#
|
||||||
|
# * Redistributions of source code must retain the above copyright notice,
|
||||||
|
# this list of conditions and the following disclaimer.
|
||||||
|
# * Redistributions in binary form must reproduce the above copyright notice,
|
||||||
|
# this list of conditions and the following disclaimer in the documentation
|
||||||
|
# and/or other materials provided with the distribution.
|
||||||
|
# * The name of the author may not be used to endorse or promote products
|
||||||
|
# derived from this software without specific prior written permission.
|
||||||
|
#
|
||||||
|
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
|
||||||
|
# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
||||||
|
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
|
||||||
|
# ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
|
||||||
|
# LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
|
||||||
|
# CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
|
||||||
|
# SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
|
||||||
|
# INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
|
||||||
|
# CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
|
||||||
|
# ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
|
||||||
|
# POSSIBILITY OF SUCH DAMAGE.
|
||||||
|
|
||||||
|
from generic import *
|
||||||
|
from pdf import PdfFileReader, PdfFileWriter, Destination
|
||||||
|
|
||||||
|
class _MergedPage(object):
|
||||||
|
"""
|
||||||
|
_MergedPage is used internally by PdfFileMerger to collect necessary information on each page that is being merged.
|
||||||
|
"""
|
||||||
|
def __init__(self, pagedata, src, id):
|
||||||
|
self.src = src
|
||||||
|
self.pagedata = pagedata
|
||||||
|
self.out_pagedata = None
|
||||||
|
self.id = id
|
||||||
|
|
||||||
|
class PdfFileMerger(object):
|
||||||
|
"""
|
||||||
|
PdfFileMerger merges multiple PDFs into a single PDF. It can concatenate,
|
||||||
|
slice, insert, or any combination of the above.
|
||||||
|
|
||||||
|
See the functions "merge" (or "append") and "write" (or "overwrite") for
|
||||||
|
usage information.
|
||||||
|
"""
|
||||||
|
|
||||||
|
def __init__(self):
|
||||||
|
"""
|
||||||
|
>>> PdfFileMerger()
|
||||||
|
|
||||||
|
Initializes a PdfFileMerger, no parameters required
|
||||||
|
"""
|
||||||
|
self.inputs = []
|
||||||
|
self.pages = []
|
||||||
|
self.output = PdfFileWriter()
|
||||||
|
self.bookmarks = []
|
||||||
|
self.named_dests = []
|
||||||
|
self.id_count = 0
|
||||||
|
|
||||||
|
def merge(self, position, fileobj, bookmark=None, pages=None, import_bookmarks=True):
|
||||||
|
"""
|
||||||
|
>>> merge(position, file, bookmark=None, pages=None, import_bookmarks=True)
|
||||||
|
|
||||||
|
Merges the pages from the source document specified by "file" into the output
|
||||||
|
file at the page number specified by "position".
|
||||||
|
|
||||||
|
Optionally, you may specify a bookmark to be applied at the beginning of the
|
||||||
|
included file by supplying the text of the bookmark in the "bookmark" parameter.
|
||||||
|
|
||||||
|
You may prevent the source document's bookmarks from being imported by
|
||||||
|
specifying "import_bookmarks" as False.
|
||||||
|
|
||||||
|
You may also use the "pages" parameter to merge only the specified range of
|
||||||
|
pages from the source document into the output document.
|
||||||
|
"""
|
||||||
|
|
||||||
|
my_file = False
|
||||||
|
if type(fileobj) in (str, unicode):
|
||||||
|
fileobj = file(fileobj, 'rb')
|
||||||
|
my_file = True
|
||||||
|
|
||||||
|
if type(fileobj) == PdfFileReader:
|
||||||
|
pdfr = fileobj
|
||||||
|
fileobj = pdfr.file
|
||||||
|
else:
|
||||||
|
pdfr = PdfFileReader(fileobj)
|
||||||
|
|
||||||
|
# Find the range of pages to merge
|
||||||
|
if pages == None:
|
||||||
|
pages = (0, pdfr.getNumPages())
|
||||||
|
elif type(pages) in (int, float, str, unicode):
|
||||||
|
raise TypeError('"pages" must be a tuple of (start, end)')
|
||||||
|
|
||||||
|
srcpages = []
|
||||||
|
|
||||||
|
if bookmark:
|
||||||
|
bookmark = Bookmark(TextStringObject(bookmark), NumberObject(self.id_count), NameObject('/Fit'))
|
||||||
|
|
||||||
|
outline = []
|
||||||
|
if import_bookmarks:
|
||||||
|
outline = pdfr.getOutlines()
|
||||||
|
outline = self._trim_outline(pdfr, outline, pages)
|
||||||
|
|
||||||
|
if bookmark:
|
||||||
|
self.bookmarks += [bookmark, outline]
|
||||||
|
else:
|
||||||
|
self.bookmarks += outline
|
||||||
|
|
||||||
|
dests = pdfr.namedDestinations
|
||||||
|
dests = self._trim_dests(pdfr, dests, pages)
|
||||||
|
self.named_dests += dests
|
||||||
|
|
||||||
|
# Gather all the pages that are going to be merged
|
||||||
|
for i in range(*pages):
|
||||||
|
pg = pdfr.getPage(i)
|
||||||
|
|
||||||
|
id = self.id_count
|
||||||
|
self.id_count += 1
|
||||||
|
|
||||||
|
mp = _MergedPage(pg, pdfr, id)
|
||||||
|
|
||||||
|
srcpages.append(mp)
|
||||||
|
|
||||||
|
self._associate_dests_to_pages(srcpages)
|
||||||
|
self._associate_bookmarks_to_pages(srcpages)
|
||||||
|
|
||||||
|
|
||||||
|
# Slice to insert the pages at the specified position
|
||||||
|
self.pages[position:position] = srcpages
|
||||||
|
|
||||||
|
# Keep track of our input files so we can close them later
|
||||||
|
self.inputs.append((fileobj, pdfr, my_file))
|
||||||
|
|
||||||
|
|
||||||
|
def append(self, fileobj, bookmark=None, pages=None, import_bookmarks=True):
|
||||||
|
"""
|
||||||
|
>>> append(file, bookmark=None, pages=None, import_bookmarks=True):
|
||||||
|
|
||||||
|
Identical to the "merge" function, but assumes you want to concatenate all pages
|
||||||
|
onto the end of the file instead of specifying a position.
|
||||||
|
"""
|
||||||
|
|
||||||
|
self.merge(len(self.pages), fileobj, bookmark, pages, import_bookmarks)
|
||||||
|
|
||||||
|
|
||||||
|
def write(self, fileobj):
|
||||||
|
"""
|
||||||
|
>>> write(file)
|
||||||
|
|
||||||
|
Writes all data that has been merged to "file" (which can be a filename or any
|
||||||
|
kind of file-like object)
|
||||||
|
"""
|
||||||
|
my_file = False
|
||||||
|
if type(fileobj) in (str, unicode):
|
||||||
|
fileobj = file(fileobj, 'wb')
|
||||||
|
my_file = True
|
||||||
|
|
||||||
|
|
||||||
|
# Add pages to the PdfFileWriter
|
||||||
|
for page in self.pages:
|
||||||
|
self.output.addPage(page.pagedata)
|
||||||
|
page.out_pagedata = self.output.getReference(self.output._pages.getObject()["/Kids"][-1].getObject())
|
||||||
|
|
||||||
|
|
||||||
|
# Once all pages are added, create bookmarks to point at those pages
|
||||||
|
self._write_dests()
|
||||||
|
self._write_bookmarks()
|
||||||
|
|
||||||
|
# Write the output to the file
|
||||||
|
self.output.write(fileobj)
|
||||||
|
|
||||||
|
if my_file:
|
||||||
|
fileobj.close()
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
def close(self):
|
||||||
|
"""
|
||||||
|
>>> close()
|
||||||
|
|
||||||
|
Shuts all file descriptors (input and output) and clears all memory usage
|
||||||
|
"""
|
||||||
|
self.pages = []
|
||||||
|
for fo, pdfr, mine in self.inputs:
|
||||||
|
if mine:
|
||||||
|
fo.close()
|
||||||
|
|
||||||
|
self.inputs = []
|
||||||
|
self.output = None
|
||||||
|
|
||||||
|
def _trim_dests(self, pdf, dests, pages):
|
||||||
|
"""
|
||||||
|
Removes any named destinations that are not a part of the specified page set
|
||||||
|
"""
|
||||||
|
new_dests = []
|
||||||
|
prev_header_added = True
|
||||||
|
for k, o in dests.items():
|
||||||
|
for j in range(*pages):
|
||||||
|
if pdf.getPage(j).getObject() == o['/Page'].getObject():
|
||||||
|
o[NameObject('/Page')] = o['/Page'].getObject()
|
||||||
|
assert str(k) == str(o['/Title'])
|
||||||
|
new_dests.append(o)
|
||||||
|
break
|
||||||
|
return new_dests
|
||||||
|
|
||||||
|
def _trim_outline(self, pdf, outline, pages):
|
||||||
|
"""
|
||||||
|
Removes any outline/bookmark entries that are not a part of the specified page set
|
||||||
|
"""
|
||||||
|
new_outline = []
|
||||||
|
prev_header_added = True
|
||||||
|
for i, o in enumerate(outline):
|
||||||
|
if type(o) == list:
|
||||||
|
sub = self._trim_outline(pdf, o, pages)
|
||||||
|
if sub:
|
||||||
|
if not prev_header_added:
|
||||||
|
new_outline.append(outline[i-1])
|
||||||
|
new_outline.append(sub)
|
||||||
|
else:
|
||||||
|
prev_header_added = False
|
||||||
|
for j in range(*pages):
|
||||||
|
if pdf.getPage(j).getObject() == o['/Page'].getObject():
|
||||||
|
o[NameObject('/Page')] = o['/Page'].getObject()
|
||||||
|
new_outline.append(o)
|
||||||
|
prev_header_added = True
|
||||||
|
break
|
||||||
|
return new_outline
|
||||||
|
|
||||||
|
def _write_dests(self):
|
||||||
|
dests = self.named_dests
|
||||||
|
|
||||||
|
for v in dests:
|
||||||
|
pageno = None
|
||||||
|
pdf = None
|
||||||
|
if v.has_key('/Page'):
|
||||||
|
for i, p in enumerate(self.pages):
|
||||||
|
if p.id == v['/Page']:
|
||||||
|
v[NameObject('/Page')] = p.out_pagedata
|
||||||
|
pageno = i
|
||||||
|
pdf = p.src
|
||||||
|
if pageno != None:
|
||||||
|
self.output.addNamedDestinationObject(v)
|
||||||
|
|
||||||
|
def _write_bookmarks(self, bookmarks=None, parent=None):
|
||||||
|
|
||||||
|
if bookmarks == None:
|
||||||
|
bookmarks = self.bookmarks
|
||||||
|
|
||||||
|
|
||||||
|
last_added = None
|
||||||
|
for b in bookmarks:
|
||||||
|
if type(b) == list:
|
||||||
|
self._write_bookmarks(b, last_added)
|
||||||
|
continue
|
||||||
|
|
||||||
|
pageno = None
|
||||||
|
pdf = None
|
||||||
|
if b.has_key('/Page'):
|
||||||
|
for i, p in enumerate(self.pages):
|
||||||
|
if p.id == b['/Page']:
|
||||||
|
b[NameObject('/Page')] = p.out_pagedata
|
||||||
|
pageno = i
|
||||||
|
pdf = p.src
|
||||||
|
if pageno != None:
|
||||||
|
last_added = self.output.addBookmarkDestination(b, parent)
|
||||||
|
|
||||||
|
|
||||||
|
def _associate_dests_to_pages(self, pages):
|
||||||
|
for nd in self.named_dests:
|
||||||
|
pageno = None
|
||||||
|
np = nd['/Page']
|
||||||
|
|
||||||
|
if type(np) == NumberObject:
|
||||||
|
continue
|
||||||
|
|
||||||
|
for p in pages:
|
||||||
|
if np.getObject() == p.pagedata.getObject():
|
||||||
|
pageno = p.id
|
||||||
|
|
||||||
|
if pageno != None:
|
||||||
|
nd[NameObject('/Page')] = NumberObject(pageno)
|
||||||
|
else:
|
||||||
|
raise ValueError, "Unresolved named destination '%s'" % (nd['/Title'],)
|
||||||
|
|
||||||
|
def _associate_bookmarks_to_pages(self, pages, bookmarks=None):
|
||||||
|
if bookmarks == None:
|
||||||
|
bookmarks = self.bookmarks
|
||||||
|
|
||||||
|
for b in bookmarks:
|
||||||
|
if type(b) == list:
|
||||||
|
self._associate_bookmarks_to_pages(pages, b)
|
||||||
|
continue
|
||||||
|
|
||||||
|
pageno = None
|
||||||
|
bp = b['/Page']
|
||||||
|
|
||||||
|
if type(bp) == NumberObject:
|
||||||
|
continue
|
||||||
|
|
||||||
|
for p in pages:
|
||||||
|
if bp.getObject() == p.pagedata.getObject():
|
||||||
|
pageno = p.id
|
||||||
|
|
||||||
|
if pageno != None:
|
||||||
|
b[NameObject('/Page')] = NumberObject(pageno)
|
||||||
|
else:
|
||||||
|
raise ValueError, "Unresolved bookmark '%s'" % (b['/Title'],)
|
||||||
|
|
||||||
|
def findBookmark(self, bookmark, root=None):
|
||||||
|
if root == None:
|
||||||
|
root = self.bookmarks
|
||||||
|
|
||||||
|
for i, b in enumerate(root):
|
||||||
|
if type(b) == list:
|
||||||
|
res = self.findBookmark(bookmark, b)
|
||||||
|
if res:
|
||||||
|
return [i] + res
|
||||||
|
if b == bookmark or b['/Title'] == bookmark:
|
||||||
|
return [i]
|
||||||
|
|
||||||
|
return None
|
||||||
|
|
||||||
|
def addBookmark(self, title, pagenum, parent=None):
|
||||||
|
"""
|
||||||
|
Add a bookmark to the pdf, using the specified title and pointing at
|
||||||
|
the specified page number. A parent can be specified to make this a
|
||||||
|
nested bookmark below the parent.
|
||||||
|
"""
|
||||||
|
|
||||||
|
if parent == None:
|
||||||
|
iloc = [len(self.bookmarks)-1]
|
||||||
|
elif type(parent) == list:
|
||||||
|
iloc = parent
|
||||||
|
else:
|
||||||
|
iloc = self.findBookmark(parent)
|
||||||
|
|
||||||
|
dest = Bookmark(TextStringObject(title), NumberObject(pagenum), NameObject('/FitH'), NumberObject(826))
|
||||||
|
|
||||||
|
if parent == None:
|
||||||
|
self.bookmarks.append(dest)
|
||||||
|
else:
|
||||||
|
bmparent = self.bookmarks
|
||||||
|
for i in iloc[:-1]:
|
||||||
|
bmparent = bmparent[i]
|
||||||
|
npos = iloc[-1]+1
|
||||||
|
if npos < len(bmparent) and type(bmparent[npos]) == list:
|
||||||
|
bmparent[npos].append(dest)
|
||||||
|
else:
|
||||||
|
bmparent.insert(npos, [dest])
|
||||||
|
|
||||||
|
|
||||||
|
def addNamedDestination(self, title, pagenum):
|
||||||
|
"""
|
||||||
|
Add a destination to the pdf, using the specified title and pointing
|
||||||
|
at the specified page number.
|
||||||
|
"""
|
||||||
|
|
||||||
|
dest = Destination(TextStringObject(title), NumberObject(pagenum), NameObject('/FitH'), NumberObject(826))
|
||||||
|
self.named_dests.append(dest)
|
||||||
|
|
||||||
|
|
||||||
|
class OutlinesObject(list):
|
||||||
|
def __init__(self, pdf, tree, parent=None):
|
||||||
|
list.__init__(self)
|
||||||
|
self.tree = tree
|
||||||
|
self.pdf = pdf
|
||||||
|
self.parent = parent
|
||||||
|
|
||||||
|
def remove(self, index):
|
||||||
|
obj = self[index]
|
||||||
|
del self[index]
|
||||||
|
self.tree.removeChild(obj)
|
||||||
|
|
||||||
|
def add(self, title, page):
|
||||||
|
pageRef = self.pdf.getObject(self.pdf._pages)['/Kids'][pagenum]
|
||||||
|
action = DictionaryObject()
|
||||||
|
action.update({
|
||||||
|
NameObject('/D') : ArrayObject([pageRef, NameObject('/FitH'), NumberObject(826)]),
|
||||||
|
NameObject('/S') : NameObject('/GoTo')
|
||||||
|
})
|
||||||
|
actionRef = self.pdf._addObject(action)
|
||||||
|
bookmark = TreeObject()
|
||||||
|
|
||||||
|
bookmark.update({
|
||||||
|
NameObject('/A') : actionRef,
|
||||||
|
NameObject('/Title') : createStringObject(title),
|
||||||
|
})
|
||||||
|
|
||||||
|
pdf._addObject(bookmark)
|
||||||
|
|
||||||
|
self.tree.addChild(bookmark)
|
||||||
|
|
||||||
|
def removeAll(self):
|
||||||
|
for child in [x for x in self.tree.children()]:
|
||||||
|
self.tree.removeChild(child)
|
||||||
|
self.pop()
|
File diff suppressed because it is too large
Load Diff
|
@ -0,0 +1,125 @@
|
||||||
|
# vim: sw=4:expandtab:foldmethod=marker
|
||||||
|
#
|
||||||
|
# Copyright (c) 2006, Mathieu Fenniak
|
||||||
|
# All rights reserved.
|
||||||
|
#
|
||||||
|
# Redistribution and use in source and binary forms, with or without
|
||||||
|
# modification, are permitted provided that the following conditions are
|
||||||
|
# met:
|
||||||
|
#
|
||||||
|
# * Redistributions of source code must retain the above copyright notice,
|
||||||
|
# this list of conditions and the following disclaimer.
|
||||||
|
# * Redistributions in binary form must reproduce the above copyright notice,
|
||||||
|
# this list of conditions and the following disclaimer in the documentation
|
||||||
|
# and/or other materials provided with the distribution.
|
||||||
|
# * The name of the author may not be used to endorse or promote products
|
||||||
|
# derived from this software without specific prior written permission.
|
||||||
|
#
|
||||||
|
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
|
||||||
|
# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
||||||
|
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
|
||||||
|
# ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
|
||||||
|
# LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
|
||||||
|
# CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
|
||||||
|
# SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
|
||||||
|
# INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
|
||||||
|
# CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
|
||||||
|
# ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
|
||||||
|
# POSSIBILITY OF SUCH DAMAGE.
|
||||||
|
|
||||||
|
|
||||||
|
"""
|
||||||
|
Utility functions for PDF library.
|
||||||
|
"""
|
||||||
|
__author__ = "Mathieu Fenniak"
|
||||||
|
__author_email__ = "biziqe@mathieu.fenniak.net"
|
||||||
|
|
||||||
|
#ENABLE_PSYCO = False
|
||||||
|
#if ENABLE_PSYCO:
|
||||||
|
# try:
|
||||||
|
# import psyco
|
||||||
|
# except ImportError:
|
||||||
|
# ENABLE_PSYCO = False
|
||||||
|
#
|
||||||
|
#if not ENABLE_PSYCO:
|
||||||
|
# class psyco:
|
||||||
|
# def proxy(func):
|
||||||
|
# return func
|
||||||
|
# proxy = staticmethod(proxy)
|
||||||
|
|
||||||
|
def readUntilWhitespace(stream, maxchars=None):
|
||||||
|
txt = ""
|
||||||
|
while True:
|
||||||
|
tok = stream.read(1)
|
||||||
|
if tok.isspace() or not tok:
|
||||||
|
break
|
||||||
|
txt += tok
|
||||||
|
if len(txt) == maxchars:
|
||||||
|
break
|
||||||
|
return txt
|
||||||
|
|
||||||
|
def readNonWhitespace(stream):
|
||||||
|
tok = ' '
|
||||||
|
while tok == '\n' or tok == '\r' or tok == ' ' or tok == '\t':
|
||||||
|
tok = stream.read(1)
|
||||||
|
return tok
|
||||||
|
|
||||||
|
class ConvertFunctionsToVirtualList(object):
|
||||||
|
def __init__(self, lengthFunction, getFunction):
|
||||||
|
self.lengthFunction = lengthFunction
|
||||||
|
self.getFunction = getFunction
|
||||||
|
|
||||||
|
def __len__(self):
|
||||||
|
return self.lengthFunction()
|
||||||
|
|
||||||
|
def __getitem__(self, index):
|
||||||
|
if not isinstance(index, int):
|
||||||
|
raise TypeError, "sequence indices must be integers"
|
||||||
|
len_self = len(self)
|
||||||
|
if index < 0:
|
||||||
|
# support negative indexes
|
||||||
|
index = len_self + index
|
||||||
|
if index < 0 or index >= len_self:
|
||||||
|
raise IndexError, "sequence index out of range"
|
||||||
|
return self.getFunction(index)
|
||||||
|
|
||||||
|
def RC4_encrypt(key, plaintext):
|
||||||
|
S = [i for i in range(256)]
|
||||||
|
j = 0
|
||||||
|
for i in range(256):
|
||||||
|
j = (j + S[i] + ord(key[i % len(key)])) % 256
|
||||||
|
S[i], S[j] = S[j], S[i]
|
||||||
|
i, j = 0, 0
|
||||||
|
retval = ""
|
||||||
|
for x in range(len(plaintext)):
|
||||||
|
i = (i + 1) % 256
|
||||||
|
j = (j + S[i]) % 256
|
||||||
|
S[i], S[j] = S[j], S[i]
|
||||||
|
t = S[(S[i] + S[j]) % 256]
|
||||||
|
retval += chr(ord(plaintext[x]) ^ t)
|
||||||
|
return retval
|
||||||
|
|
||||||
|
def matrixMultiply(a, b):
|
||||||
|
return [[sum([float(i)*float(j)
|
||||||
|
for i, j in zip(row, col)]
|
||||||
|
) for col in zip(*b)]
|
||||||
|
for row in a]
|
||||||
|
|
||||||
|
class PyPdfError(Exception):
|
||||||
|
pass
|
||||||
|
|
||||||
|
class PdfReadError(PyPdfError):
|
||||||
|
pass
|
||||||
|
|
||||||
|
class PageSizeNotDefinedError(PyPdfError):
|
||||||
|
pass
|
||||||
|
|
||||||
|
class PdfReadWarning(UserWarning):
|
||||||
|
pass
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
# test RC4
|
||||||
|
out = RC4_encrypt("Key", "Plaintext")
|
||||||
|
print repr(out)
|
||||||
|
pt = RC4_encrypt("Key", out)
|
||||||
|
print repr(pt)
|
|
@ -0,0 +1,355 @@
|
||||||
|
import re
|
||||||
|
import datetime
|
||||||
|
import decimal
|
||||||
|
from generic import PdfObject
|
||||||
|
from xml.dom import getDOMImplementation
|
||||||
|
from xml.dom.minidom import parseString
|
||||||
|
|
||||||
|
RDF_NAMESPACE = "http://www.w3.org/1999/02/22-rdf-syntax-ns#"
|
||||||
|
DC_NAMESPACE = "http://purl.org/dc/elements/1.1/"
|
||||||
|
XMP_NAMESPACE = "http://ns.adobe.com/xap/1.0/"
|
||||||
|
PDF_NAMESPACE = "http://ns.adobe.com/pdf/1.3/"
|
||||||
|
XMPMM_NAMESPACE = "http://ns.adobe.com/xap/1.0/mm/"
|
||||||
|
|
||||||
|
# What is the PDFX namespace, you might ask? I might ask that too. It's
|
||||||
|
# a completely undocumented namespace used to place "custom metadata"
|
||||||
|
# properties, which are arbitrary metadata properties with no semantic or
|
||||||
|
# documented meaning. Elements in the namespace are key/value-style storage,
|
||||||
|
# where the element name is the key and the content is the value. The keys
|
||||||
|
# are transformed into valid XML identifiers by substituting an invalid
|
||||||
|
# identifier character with \u2182 followed by the unicode hex ID of the
|
||||||
|
# original character. A key like "my car" is therefore "my\u21820020car".
|
||||||
|
#
|
||||||
|
# \u2182, in case you're wondering, is the unicode character
|
||||||
|
# \u{ROMAN NUMERAL TEN THOUSAND}, a straightforward and obvious choice for
|
||||||
|
# escaping characters.
|
||||||
|
#
|
||||||
|
# Intentional users of the pdfx namespace should be shot on sight. A
|
||||||
|
# custom data schema and sensical XML elements could be used instead, as is
|
||||||
|
# suggested by Adobe's own documentation on XMP (under "Extensibility of
|
||||||
|
# Schemas").
|
||||||
|
#
|
||||||
|
# Information presented here on the /pdfx/ schema is a result of limited
|
||||||
|
# reverse engineering, and does not constitute a full specification.
|
||||||
|
PDFX_NAMESPACE = "http://ns.adobe.com/pdfx/1.3/"
|
||||||
|
|
||||||
|
iso8601 = re.compile("""
|
||||||
|
(?P<year>[0-9]{4})
|
||||||
|
(-
|
||||||
|
(?P<month>[0-9]{2})
|
||||||
|
(-
|
||||||
|
(?P<day>[0-9]+)
|
||||||
|
(T
|
||||||
|
(?P<hour>[0-9]{2}):
|
||||||
|
(?P<minute>[0-9]{2})
|
||||||
|
(:(?P<second>[0-9]{2}(.[0-9]+)?))?
|
||||||
|
(?P<tzd>Z|[-+][0-9]{2}:[0-9]{2})
|
||||||
|
)?
|
||||||
|
)?
|
||||||
|
)?
|
||||||
|
""", re.VERBOSE)
|
||||||
|
|
||||||
|
##
|
||||||
|
# An object that represents Adobe XMP metadata.
|
||||||
|
class XmpInformation(PdfObject):
|
||||||
|
|
||||||
|
def __init__(self, stream):
|
||||||
|
self.stream = stream
|
||||||
|
docRoot = parseString(self.stream.getData())
|
||||||
|
self.rdfRoot = docRoot.getElementsByTagNameNS(RDF_NAMESPACE, "RDF")[0]
|
||||||
|
self.cache = {}
|
||||||
|
|
||||||
|
def writeToStream(self, stream, encryption_key):
|
||||||
|
self.stream.writeToStream(stream, encryption_key)
|
||||||
|
|
||||||
|
def getElement(self, aboutUri, namespace, name):
|
||||||
|
for desc in self.rdfRoot.getElementsByTagNameNS(RDF_NAMESPACE, "Description"):
|
||||||
|
if desc.getAttributeNS(RDF_NAMESPACE, "about") == aboutUri:
|
||||||
|
attr = desc.getAttributeNodeNS(namespace, name)
|
||||||
|
if attr != None:
|
||||||
|
yield attr
|
||||||
|
for element in desc.getElementsByTagNameNS(namespace, name):
|
||||||
|
yield element
|
||||||
|
|
||||||
|
def getNodesInNamespace(self, aboutUri, namespace):
|
||||||
|
for desc in self.rdfRoot.getElementsByTagNameNS(RDF_NAMESPACE, "Description"):
|
||||||
|
if desc.getAttributeNS(RDF_NAMESPACE, "about") == aboutUri:
|
||||||
|
for i in range(desc.attributes.length):
|
||||||
|
attr = desc.attributes.item(i)
|
||||||
|
if attr.namespaceURI == namespace:
|
||||||
|
yield attr
|
||||||
|
for child in desc.childNodes:
|
||||||
|
if child.namespaceURI == namespace:
|
||||||
|
yield child
|
||||||
|
|
||||||
|
def _getText(self, element):
|
||||||
|
text = ""
|
||||||
|
for child in element.childNodes:
|
||||||
|
if child.nodeType == child.TEXT_NODE:
|
||||||
|
text += child.data
|
||||||
|
return text
|
||||||
|
|
||||||
|
def _converter_string(value):
|
||||||
|
return value
|
||||||
|
|
||||||
|
def _converter_date(value):
|
||||||
|
m = iso8601.match(value)
|
||||||
|
year = int(m.group("year"))
|
||||||
|
month = int(m.group("month") or "1")
|
||||||
|
day = int(m.group("day") or "1")
|
||||||
|
hour = int(m.group("hour") or "0")
|
||||||
|
minute = int(m.group("minute") or "0")
|
||||||
|
second = decimal.Decimal(m.group("second") or "0")
|
||||||
|
seconds = second.to_integral(decimal.ROUND_FLOOR)
|
||||||
|
milliseconds = (second - seconds) * 1000000
|
||||||
|
tzd = m.group("tzd") or "Z"
|
||||||
|
dt = datetime.datetime(year, month, day, hour, minute, seconds, milliseconds)
|
||||||
|
if tzd != "Z":
|
||||||
|
tzd_hours, tzd_minutes = [int(x) for x in tzd.split(":")]
|
||||||
|
tzd_hours *= -1
|
||||||
|
if tzd_hours < 0:
|
||||||
|
tzd_minutes *= -1
|
||||||
|
dt = dt + datetime.timedelta(hours=tzd_hours, minutes=tzd_minutes)
|
||||||
|
return dt
|
||||||
|
_test_converter_date = staticmethod(_converter_date)
|
||||||
|
|
||||||
|
def _getter_bag(namespace, name, converter):
|
||||||
|
def get(self):
|
||||||
|
cached = self.cache.get(namespace, {}).get(name)
|
||||||
|
if cached:
|
||||||
|
return cached
|
||||||
|
retval = []
|
||||||
|
for element in self.getElement("", namespace, name):
|
||||||
|
bags = element.getElementsByTagNameNS(RDF_NAMESPACE, "Bag")
|
||||||
|
if len(bags):
|
||||||
|
for bag in bags:
|
||||||
|
for item in bag.getElementsByTagNameNS(RDF_NAMESPACE, "li"):
|
||||||
|
value = self._getText(item)
|
||||||
|
value = converter(value)
|
||||||
|
retval.append(value)
|
||||||
|
ns_cache = self.cache.setdefault(namespace, {})
|
||||||
|
ns_cache[name] = retval
|
||||||
|
return retval
|
||||||
|
return get
|
||||||
|
|
||||||
|
def _getter_seq(namespace, name, converter):
|
||||||
|
def get(self):
|
||||||
|
cached = self.cache.get(namespace, {}).get(name)
|
||||||
|
if cached:
|
||||||
|
return cached
|
||||||
|
retval = []
|
||||||
|
for element in self.getElement("", namespace, name):
|
||||||
|
seqs = element.getElementsByTagNameNS(RDF_NAMESPACE, "Seq")
|
||||||
|
if len(seqs):
|
||||||
|
for seq in seqs:
|
||||||
|
for item in seq.getElementsByTagNameNS(RDF_NAMESPACE, "li"):
|
||||||
|
value = self._getText(item)
|
||||||
|
value = converter(value)
|
||||||
|
retval.append(value)
|
||||||
|
else:
|
||||||
|
value = converter(self._getText(element))
|
||||||
|
retval.append(value)
|
||||||
|
ns_cache = self.cache.setdefault(namespace, {})
|
||||||
|
ns_cache[name] = retval
|
||||||
|
return retval
|
||||||
|
return get
|
||||||
|
|
||||||
|
def _getter_langalt(namespace, name, converter):
|
||||||
|
def get(self):
|
||||||
|
cached = self.cache.get(namespace, {}).get(name)
|
||||||
|
if cached:
|
||||||
|
return cached
|
||||||
|
retval = {}
|
||||||
|
for element in self.getElement("", namespace, name):
|
||||||
|
alts = element.getElementsByTagNameNS(RDF_NAMESPACE, "Alt")
|
||||||
|
if len(alts):
|
||||||
|
for alt in alts:
|
||||||
|
for item in alt.getElementsByTagNameNS(RDF_NAMESPACE, "li"):
|
||||||
|
value = self._getText(item)
|
||||||
|
value = converter(value)
|
||||||
|
retval[item.getAttribute("xml:lang")] = value
|
||||||
|
else:
|
||||||
|
retval["x-default"] = converter(self._getText(element))
|
||||||
|
ns_cache = self.cache.setdefault(namespace, {})
|
||||||
|
ns_cache[name] = retval
|
||||||
|
return retval
|
||||||
|
return get
|
||||||
|
|
||||||
|
def _getter_single(namespace, name, converter):
|
||||||
|
def get(self):
|
||||||
|
cached = self.cache.get(namespace, {}).get(name)
|
||||||
|
if cached:
|
||||||
|
return cached
|
||||||
|
value = None
|
||||||
|
for element in self.getElement("", namespace, name):
|
||||||
|
if element.nodeType == element.ATTRIBUTE_NODE:
|
||||||
|
value = element.nodeValue
|
||||||
|
else:
|
||||||
|
value = self._getText(element)
|
||||||
|
break
|
||||||
|
if value != None:
|
||||||
|
value = converter(value)
|
||||||
|
ns_cache = self.cache.setdefault(namespace, {})
|
||||||
|
ns_cache[name] = value
|
||||||
|
return value
|
||||||
|
return get
|
||||||
|
|
||||||
|
##
|
||||||
|
# Contributors to the resource (other than the authors). An unsorted
|
||||||
|
# array of names.
|
||||||
|
# <p>Stability: Added in v1.12, will exist for all future v1.x releases.
|
||||||
|
dc_contributor = property(_getter_bag(DC_NAMESPACE, "contributor", _converter_string))
|
||||||
|
|
||||||
|
##
|
||||||
|
# Text describing the extent or scope of the resource.
|
||||||
|
# <p>Stability: Added in v1.12, will exist for all future v1.x releases.
|
||||||
|
dc_coverage = property(_getter_single(DC_NAMESPACE, "coverage", _converter_string))
|
||||||
|
|
||||||
|
##
|
||||||
|
# A sorted array of names of the authors of the resource, listed in order
|
||||||
|
# of precedence.
|
||||||
|
# <p>Stability: Added in v1.12, will exist for all future v1.x releases.
|
||||||
|
dc_creator = property(_getter_seq(DC_NAMESPACE, "creator", _converter_string))
|
||||||
|
|
||||||
|
##
|
||||||
|
# A sorted array of dates (datetime.datetime instances) of signifigance to
|
||||||
|
# the resource. The dates and times are in UTC.
|
||||||
|
# <p>Stability: Added in v1.12, will exist for all future v1.x releases.
|
||||||
|
dc_date = property(_getter_seq(DC_NAMESPACE, "date", _converter_date))
|
||||||
|
|
||||||
|
##
|
||||||
|
# A language-keyed dictionary of textual descriptions of the content of the
|
||||||
|
# resource.
|
||||||
|
# <p>Stability: Added in v1.12, will exist for all future v1.x releases.
|
||||||
|
dc_description = property(_getter_langalt(DC_NAMESPACE, "description", _converter_string))
|
||||||
|
|
||||||
|
##
|
||||||
|
# The mime-type of the resource.
|
||||||
|
# <p>Stability: Added in v1.12, will exist for all future v1.x releases.
|
||||||
|
dc_format = property(_getter_single(DC_NAMESPACE, "format", _converter_string))
|
||||||
|
|
||||||
|
##
|
||||||
|
# Unique identifier of the resource.
|
||||||
|
# <p>Stability: Added in v1.12, will exist for all future v1.x releases.
|
||||||
|
dc_identifier = property(_getter_single(DC_NAMESPACE, "identifier", _converter_string))
|
||||||
|
|
||||||
|
##
|
||||||
|
# An unordered array specifying the languages used in the resource.
|
||||||
|
# <p>Stability: Added in v1.12, will exist for all future v1.x releases.
|
||||||
|
dc_language = property(_getter_bag(DC_NAMESPACE, "language", _converter_string))
|
||||||
|
|
||||||
|
##
|
||||||
|
# An unordered array of publisher names.
|
||||||
|
# <p>Stability: Added in v1.12, will exist for all future v1.x releases.
|
||||||
|
dc_publisher = property(_getter_bag(DC_NAMESPACE, "publisher", _converter_string))
|
||||||
|
|
||||||
|
##
|
||||||
|
# An unordered array of text descriptions of relationships to other
|
||||||
|
# documents.
|
||||||
|
# <p>Stability: Added in v1.12, will exist for all future v1.x releases.
|
||||||
|
dc_relation = property(_getter_bag(DC_NAMESPACE, "relation", _converter_string))
|
||||||
|
|
||||||
|
##
|
||||||
|
# A language-keyed dictionary of textual descriptions of the rights the
|
||||||
|
# user has to this resource.
|
||||||
|
# <p>Stability: Added in v1.12, will exist for all future v1.x releases.
|
||||||
|
dc_rights = property(_getter_langalt(DC_NAMESPACE, "rights", _converter_string))
|
||||||
|
|
||||||
|
##
|
||||||
|
# Unique identifier of the work from which this resource was derived.
|
||||||
|
# <p>Stability: Added in v1.12, will exist for all future v1.x releases.
|
||||||
|
dc_source = property(_getter_single(DC_NAMESPACE, "source", _converter_string))
|
||||||
|
|
||||||
|
##
|
||||||
|
# An unordered array of descriptive phrases or keywrods that specify the
|
||||||
|
# topic of the content of the resource.
|
||||||
|
# <p>Stability: Added in v1.12, will exist for all future v1.x releases.
|
||||||
|
dc_subject = property(_getter_bag(DC_NAMESPACE, "subject", _converter_string))
|
||||||
|
|
||||||
|
##
|
||||||
|
# A language-keyed dictionary of the title of the resource.
|
||||||
|
# <p>Stability: Added in v1.12, will exist for all future v1.x releases.
|
||||||
|
dc_title = property(_getter_langalt(DC_NAMESPACE, "title", _converter_string))
|
||||||
|
|
||||||
|
##
|
||||||
|
# An unordered array of textual descriptions of the document type.
|
||||||
|
# <p>Stability: Added in v1.12, will exist for all future v1.x releases.
|
||||||
|
dc_type = property(_getter_bag(DC_NAMESPACE, "type", _converter_string))
|
||||||
|
|
||||||
|
##
|
||||||
|
# An unformatted text string representing document keywords.
|
||||||
|
# <p>Stability: Added in v1.12, will exist for all future v1.x releases.
|
||||||
|
pdf_keywords = property(_getter_single(PDF_NAMESPACE, "Keywords", _converter_string))
|
||||||
|
|
||||||
|
##
|
||||||
|
# The PDF file version, for example 1.0, 1.3.
|
||||||
|
# <p>Stability: Added in v1.12, will exist for all future v1.x releases.
|
||||||
|
pdf_pdfversion = property(_getter_single(PDF_NAMESPACE, "PDFVersion", _converter_string))
|
||||||
|
|
||||||
|
##
|
||||||
|
# The name of the tool that created the PDF document.
|
||||||
|
# <p>Stability: Added in v1.12, will exist for all future v1.x releases.
|
||||||
|
pdf_producer = property(_getter_single(PDF_NAMESPACE, "Producer", _converter_string))
|
||||||
|
|
||||||
|
##
|
||||||
|
# The date and time the resource was originally created. The date and
|
||||||
|
# time are returned as a UTC datetime.datetime object.
|
||||||
|
# <p>Stability: Added in v1.12, will exist for all future v1.x releases.
|
||||||
|
xmp_createDate = property(_getter_single(XMP_NAMESPACE, "CreateDate", _converter_date))
|
||||||
|
|
||||||
|
##
|
||||||
|
# The date and time the resource was last modified. The date and time
|
||||||
|
# are returned as a UTC datetime.datetime object.
|
||||||
|
# <p>Stability: Added in v1.12, will exist for all future v1.x releases.
|
||||||
|
xmp_modifyDate = property(_getter_single(XMP_NAMESPACE, "ModifyDate", _converter_date))
|
||||||
|
|
||||||
|
##
|
||||||
|
# The date and time that any metadata for this resource was last
|
||||||
|
# changed. The date and time are returned as a UTC datetime.datetime
|
||||||
|
# object.
|
||||||
|
# <p>Stability: Added in v1.12, will exist for all future v1.x releases.
|
||||||
|
xmp_metadataDate = property(_getter_single(XMP_NAMESPACE, "MetadataDate", _converter_date))
|
||||||
|
|
||||||
|
##
|
||||||
|
# The name of the first known tool used to create the resource.
|
||||||
|
# <p>Stability: Added in v1.12, will exist for all future v1.x releases.
|
||||||
|
xmp_creatorTool = property(_getter_single(XMP_NAMESPACE, "CreatorTool", _converter_string))
|
||||||
|
|
||||||
|
##
|
||||||
|
# The common identifier for all versions and renditions of this resource.
|
||||||
|
# <p>Stability: Added in v1.12, will exist for all future v1.x releases.
|
||||||
|
xmpmm_documentId = property(_getter_single(XMPMM_NAMESPACE, "DocumentID", _converter_string))
|
||||||
|
|
||||||
|
##
|
||||||
|
# An identifier for a specific incarnation of a document, updated each
|
||||||
|
# time a file is saved.
|
||||||
|
# <p>Stability: Added in v1.12, will exist for all future v1.x releases.
|
||||||
|
xmpmm_instanceId = property(_getter_single(XMPMM_NAMESPACE, "InstanceID", _converter_string))
|
||||||
|
|
||||||
|
def custom_properties(self):
|
||||||
|
if not hasattr(self, "_custom_properties"):
|
||||||
|
self._custom_properties = {}
|
||||||
|
for node in self.getNodesInNamespace("", PDFX_NAMESPACE):
|
||||||
|
key = node.localName
|
||||||
|
while True:
|
||||||
|
# see documentation about PDFX_NAMESPACE earlier in file
|
||||||
|
idx = key.find(u"\u2182")
|
||||||
|
if idx == -1:
|
||||||
|
break
|
||||||
|
key = key[:idx] + chr(int(key[idx+1:idx+5], base=16)) + key[idx+5:]
|
||||||
|
if node.nodeType == node.ATTRIBUTE_NODE:
|
||||||
|
value = node.nodeValue
|
||||||
|
else:
|
||||||
|
value = self._getText(node)
|
||||||
|
self._custom_properties[key] = value
|
||||||
|
return self._custom_properties
|
||||||
|
|
||||||
|
##
|
||||||
|
# Retrieves custom metadata properties defined in the undocumented pdfx
|
||||||
|
# metadata schema.
|
||||||
|
# <p>Stability: Added in v1.12, will exist for all future v1.x releases.
|
||||||
|
# @return Returns a dictionary of key/value items for custom metadata
|
||||||
|
# properties.
|
||||||
|
custom_properties = property(custom_properties)
|
||||||
|
|
||||||
|
|
|
@ -0,0 +1,38 @@
|
||||||
|
Example:
|
||||||
|
|
||||||
|
from pyPdf import PdfFileWriter, PdfFileReader
|
||||||
|
|
||||||
|
output = PdfFileWriter()
|
||||||
|
input1 = PdfFileReader(file("document1.pdf", "rb"))
|
||||||
|
|
||||||
|
# add page 1 from input1 to output document, unchanged
|
||||||
|
output.addPage(input1.getPage(0))
|
||||||
|
|
||||||
|
# add page 2 from input1, but rotated clockwise 90 degrees
|
||||||
|
output.addPage(input1.getPage(1).rotateClockwise(90))
|
||||||
|
|
||||||
|
# add page 3 from input1, rotated the other way:
|
||||||
|
output.addPage(input1.getPage(2).rotateCounterClockwise(90))
|
||||||
|
# alt: output.addPage(input1.getPage(2).rotateClockwise(270))
|
||||||
|
|
||||||
|
# add page 4 from input1, but first add a watermark from another pdf:
|
||||||
|
page4 = input1.getPage(3)
|
||||||
|
watermark = PdfFileReader(file("watermark.pdf", "rb"))
|
||||||
|
page4.mergePage(watermark.getPage(0))
|
||||||
|
|
||||||
|
# add page 5 from input1, but crop it to half size:
|
||||||
|
page5 = input1.getPage(4)
|
||||||
|
page5.mediaBox.upperRight = (
|
||||||
|
page5.mediaBox.getUpperRight_x() / 2,
|
||||||
|
page5.mediaBox.getUpperRight_y() / 2
|
||||||
|
)
|
||||||
|
output.addPage(page5)
|
||||||
|
|
||||||
|
# print how many pages input1 has:
|
||||||
|
print "document1.pdf has %s pages." % input1.getNumPages())
|
||||||
|
|
||||||
|
# finally, write "output" to document-output.pdf
|
||||||
|
outputStream = file("document-output.pdf", "wb")
|
||||||
|
output.write(outputStream)
|
||||||
|
|
||||||
|
|
|
@ -0,0 +1,40 @@
|
||||||
|
#!/usr/bin/env python
|
||||||
|
|
||||||
|
from distutils.core import setup
|
||||||
|
|
||||||
|
long_description = """
|
||||||
|
A Pure-Python library built as a PDF toolkit. It is capable of:
|
||||||
|
|
||||||
|
- extracting document information (title, author, ...),
|
||||||
|
- splitting documents page by page,
|
||||||
|
- merging documents page by page,
|
||||||
|
- cropping pages,
|
||||||
|
- merging multiple pages into a single page,
|
||||||
|
- encrypting and decrypting PDF files.
|
||||||
|
|
||||||
|
By being Pure-Python, it should run on any Python platform without any
|
||||||
|
dependencies on external libraries. It can also work entirely on StringIO
|
||||||
|
objects rather than file streams, allowing for PDF manipulation in memory.
|
||||||
|
It is therefore a useful tool for websites that manage or manipulate PDFs.
|
||||||
|
"""
|
||||||
|
|
||||||
|
setup(
|
||||||
|
name="pyPdf",
|
||||||
|
version="1.12",
|
||||||
|
description="PDF toolkit",
|
||||||
|
long_description=long_description,
|
||||||
|
author="Mathieu Fenniak",
|
||||||
|
author_email="biziqe@mathieu.fenniak.net",
|
||||||
|
url="http://pybrary.net/pyPdf/",
|
||||||
|
download_url="http://pybrary.net/pyPdf/pyPdf-1.12.tar.gz",
|
||||||
|
classifiers = [
|
||||||
|
"Development Status :: 5 - Production/Stable",
|
||||||
|
"Intended Audience :: Developers",
|
||||||
|
"License :: OSI Approved :: BSD License",
|
||||||
|
"Programming Language :: Python",
|
||||||
|
"Operating System :: OS Independent",
|
||||||
|
"Topic :: Software Development :: Libraries :: Python Modules",
|
||||||
|
],
|
||||||
|
packages=["pyPdf"],
|
||||||
|
)
|
||||||
|
|
Loading…
Reference in New Issue