First commit

Original PyPDF code. Updates should be coming from Noah soon.
2011-12-30 09:04:56 -06:00 · 2011-12-30 09:04:56 -06:00 · c59a212a4c
commit c59a212a4c
13 changed files with 4511 additions and 0 deletions
--- a/.gitignore
+++ b/.gitignore
@ -0,0 +1,2 @@
 *.pyc
 *.swp
--- a/205
+++ b/205
@ -0,0 +1,205 @@
 Version 1.12, 2008-09-02
 ------------------------
 - Added support for XMP metadata.
 - Fix reading files with xref streams with multiple /Index values.
 - Fix extracting content streams that use graphics operators longer than 2
   characters.  Affects merging PDF files.
 Version 1.11, 2008-05-09
 ------------------------
 - Patch from Hartmut Goebel to permit RectangleObjects to accept NumberObject
   or FloatObject values.
 - PDF compatibility fixes.
 - Fix to read object xref stream in correct order.
 - Fix for comments inside content streams.
 Version 1.10, 2007-10-04
 ------------------------
 - Text strings from PDF files are returned as Unicode string objects when
 pyPdf determines that they can be decoded (as UTF-16 strings, or as
 PDFDocEncoding strings).  Unicode objects are also written out when
 necessary.  This means that string objects in pyPdf can be either
 generic.ByteStringObject instances, or generic.TextStringObject instances.
 - The extractText method now returns a unicode string object.
 - All document information properties now return unicode string objects.  In
 the event that a document provides docinfo properties that are not decoded by
 pyPdf, the raw byte strings can be accessed with an "_raw" property (ie.
 title_raw rather than title)
 - generic.DictionaryObject instances have been enhanced to be easier to use.
 Values coming out of dictionary objects will automatically be de-referenced
 (.getObject will be called on them), unless accessed by the new "raw_get"
 method.  DictionaryObjects can now only contain PdfObject instances (as keys
 and values), making it easier to debug where non-PdfObject values (which
 cannot be written out) are entering dictionaries.
 - Support for reading named destinations and outlines in PDF files.  Original
 patch by Ashish Kulkarni.
 - Stream compatibility reading enhancements for malformed PDF files.
 - Cross reference table reading enhancements for malformed PDF files.
 - Encryption documentation.
 - Replace some "assert" statements with error raising.
 - Minor optimizations to FlateDecode algorithm increase speed when using PNG
 predictors.
 Version 1.9, 2006-12-15
 -----------------------
 - Fix several serious bugs introduced in version 1.8, caused by a failure to
   run through our PDF test suite before releasing that version.
 - Fix bug in NullObject reading and writing.
 Version 1.8, 2006-12-14
 -----------------------
 - Add support for decryption with the standard PDF security handler.  This
   allows for decrypting PDF files given the proper user or owner password.
 - Add support for encryption with the standard PDF security handler.
 - Add new pythondoc documentation.
 - Fix bug in ASCII85 decode that occurs when whitespace exists inside the
   two terminating characters of the stream.
 Version 1.7, 2006-12-10
 -----------------------
 - Fix a bug when using a single page object in two PdfFileWriter objects.
 - Adjust PyPDF to be tolerant of whitespace characters that don't belong
   during a stream object.
 - Add documentInfo property to PdfFileReader.
 - Add numPages property to PdfFileReader.
 - Add pages property to PdfFileReader.
 - Add extractText function to PdfFileReader.
 Version 1.6, 2006-06-06
 -----------------------
 - Add basic support for comments in PDF files.  This allows us to read some
   ReportLab PDFs that could not be read before.
 - Add "auto-repair" for finding xref table at slightly bad locations.
 - New StreamObject backend, cleaner and more powerful.  Allows the use of
   stream filters more easily, including compressed streams.
 - Add a graphics state push/pop around page merges.  Improves quality of
   page merges when one page's content stream leaves the graphics 
   in an abnormal state.
 - Add PageObject.compressContentStreams function, which filters all content
   streams and compresses them.  This will reduce the size of PDF pages,
   especially after they could have been decompressed in a mergePage
   operation.
 - Support inline images in PDF content streams.
 - Add support for using .NET framework compression when zlib is not
   available.  This does not make pyPdf compatible with IronPython, but it
   is a first step.
 - Add support for reading the document information dictionary, and extracting
   title, author, subject, producer and creator tags.
 - Add patch to support NullObject and multiple xref streams, from Bradley
   Lawrence.
 Version 1.5, 2006-01-28
 -----------------------
 - Fix a bug where merging pages did not work in "no-rename" cases when the
  second page has an array of content streams.
 - Remove some debugging output that should not have been present.
 Version 1.4, 2006-01-27
 -----------------------
 - Add capability to merge pages from multiple PDF files into a single page
  using the PageObject.mergePage function.  See example code (README or web
  site) for more information.
 - Add ability to modify a page's MediaBox, CropBox, BleedBox, TrimBox, and
  ArtBox properties through PageObject.  See example code (README or web site)
  for more information.
 - Refactor pdf.py into multiple files: generic.py (contains objects like
  NameObject, DictionaryObject), filters.py (contains filter code),
  utils.py (various).  This does not affect importing PdfFileReader
  or PdfFileWriter.
 - Add new decoding functions for standard PDF filters ASCIIHexDecode and
  ASCII85Decode.
 - Change url and download_url to refer to new pybrary.net web site.
 Version 1.3, 2006-01-23
 -----------------------
 - Fix new bug introduced in 1.2 where PDF files with \r line endings did not
  work properly anymore.  A new test suite developed with various PDF files
  should prevent regression bugs from now on.
 - Fix a bug where inheriting attributes from page nodes did not work.
 Version 1.2, 2006-01-23
 -----------------------
 - Improved support for files with CRLF-based line endings, fixing a common
  reported problem stating "assertion error: assert line == "%%EOF"".
 - Software author/maintainer is now officially a proud married person, which
  is sure to result in better software... somehow.
 Version 1.1, 2006-01-18
 -----------------------
 - Add capability to rotate pages.
 - Improved PDF reading support to properly manage inherited attributes from
  /Type=/Pages nodes.  This means that page groups that are rotated or have
  different media boxes or whatever will now work properly.
 - Added PDF 1.5 support.  Namely cross-reference streams and object streams.
  This release can mangle Adobe's PDFReference16.pdf successfully.
 Version 1.0, 2006-01-17
 -----------------------
 - First distutils-capable true public release.  Supports a wide variety of PDF
  files that I found sitting around on my system.
 - Does not support some PDF 1.5 features, such as object streams,
  cross-reference streams.
--- a/28
+++ b/28
@ -0,0 +1,28 @@
 Copyright (c) 2006-2008, Mathieu Fenniak
 Some contributions copyright (c) 2007, Ashish Kulkarni <kulkarni.ashish@gmail.com>
 All rights reserved.
 Redistribution and use in source and binary forms, with or without
 modification, are permitted provided that the following conditions are
 met:
 * Redistributions of source code must retain the above copyright notice,
 this list of conditions and the following disclaimer.
 * Redistributions in binary form must reproduce the above copyright notice,
 this list of conditions and the following disclaimer in the documentation
 and/or other materials provided with the distribution.
 * The name of the author may not be used to endorse or promote products
 derived from this software without specific prior written permission.
 THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
 AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
 IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
 ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
 LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
 CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
 SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
 INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
 CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
 ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
 POSSIBILITY OF SUCH DAMAGE.
--- a/MANIFEST.in
+++ b/MANIFEST.in
@ -0,0 +1 @@
 include CHANGELOG
--- a/PyPDF2/init.py
+++ b/PyPDF2/init.py
@ -0,0 +1,4 @@
 from pdf import PdfFileReader, PdfFileWriter
 from merger import PdfFileMerger
 __all__ = ["pdf", "PdfFileMerger"]
--- a/PyPDF2/filters.py
+++ b/PyPDF2/filters.py
@ -0,0 +1,252 @@
 # vim: sw=4:expandtab:foldmethod=marker
 #
 # Copyright (c) 2006, Mathieu Fenniak
 # All rights reserved.
 #
 # Redistribution and use in source and binary forms, with or without
 # modification, are permitted provided that the following conditions are
 # met:
 #
 # * Redistributions of source code must retain the above copyright notice,
 # this list of conditions and the following disclaimer.
 # * Redistributions in binary form must reproduce the above copyright notice,
 # this list of conditions and the following disclaimer in the documentation
 # and/or other materials provided with the distribution.
 # * The name of the author may not be used to endorse or promote products
 # derived from this software without specific prior written permission.
 #
 # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
 # AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
 # IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
 # ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
 # LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
 # CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
 # SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
 # INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
 # CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
 # ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
 # POSSIBILITY OF SUCH DAMAGE.
 """
 Implementation of stream filters for PDF.
 """
 __author__ = "Mathieu Fenniak"
 __author_email__ = "biziqe@mathieu.fenniak.net"
 from utils import PdfReadError
 try:
    from cStringIO import StringIO
 except ImportError:
    from StringIO import StringIO
 try:
    import zlib
    def decompress(data):
        return zlib.decompress(data)
    def compress(data):
        return zlib.compress(data)
 except ImportError:
    # Unable to import zlib.  Attempt to use the System.IO.Compression
    # library from the .NET framework. (IronPython only)
    import System
    from System import IO, Collections, Array
    def _string_to_bytearr(buf):
        retval = Array.CreateInstance(System.Byte, len(buf))
        for i in range(len(buf)):
            retval[i] = ord(buf[i])
        return retval
    def _bytearr_to_string(bytes):
        retval = ""
        for i in range(bytes.Length):
            retval += chr(bytes[i])
        return retval
    def _read_bytes(stream):
        ms = IO.MemoryStream()
        buf = Array.CreateInstance(System.Byte, 2048)
        while True:
            bytes = stream.Read(buf, 0, buf.Length)
            if bytes == 0:
                break
            else:
                ms.Write(buf, 0, bytes)
        retval = ms.ToArray()
        ms.Close()
        return retval
    def decompress(data):
        bytes = _string_to_bytearr(data)
        ms = IO.MemoryStream()
        ms.Write(bytes, 0, bytes.Length)
        ms.Position = 0  # fseek 0
        gz = IO.Compression.DeflateStream(ms, IO.Compression.CompressionMode.Decompress)
        bytes = _read_bytes(gz)
        retval = _bytearr_to_string(bytes)
        gz.Close()
        return retval
    def compress(data):
        bytes = _string_to_bytearr(data)
        ms = IO.MemoryStream()
        gz = IO.Compression.DeflateStream(ms, IO.Compression.CompressionMode.Compress, True)
        gz.Write(bytes, 0, bytes.Length)
        gz.Close()
        ms.Position = 0 # fseek 0
        bytes = ms.ToArray()
        retval = _bytearr_to_string(bytes)
        ms.Close()
        return retval
 class FlateDecode(object):
    def decode(data, decodeParms):
        data = decompress(data)
        predictor = 1
        if decodeParms:
            predictor = decodeParms.get("/Predictor", 1)
        # predictor 1 == no predictor
        if predictor != 1:
            columns = decodeParms["/Columns"]
            # PNG prediction:
            if predictor >= 10 and predictor <= 15:
                output = StringIO()
                # PNG prediction can vary from row to row
                rowlength = columns + 1
                assert len(data) % rowlength == 0
                prev_rowdata = (0,) * rowlength
                for row in xrange(len(data) / rowlength):
                    rowdata = [ord(x) for x in data[(row*rowlength):((row+1)*rowlength)]]
                    filterByte = rowdata[0]
                    if filterByte == 0:
                        pass
                    elif filterByte == 1:
                        for i in range(2, rowlength):
                            rowdata[i] = (rowdata[i] + rowdata[i-1]) % 256
                    elif filterByte == 2:
                        for i in range(1, rowlength):
                            rowdata[i] = (rowdata[i] + prev_rowdata[i]) % 256
                    else:
                        # unsupported PNG filter
                        raise PdfReadError("Unsupported PNG filter %r" % filterByte)
                    prev_rowdata = rowdata
                    output.write(''.join([chr(x) for x in rowdata[1:]]))
                data = output.getvalue()
            else:
                # unsupported predictor
                raise PdfReadError("Unsupported flatedecode predictor %r" % predictor)
        return data
    decode = staticmethod(decode)
    def encode(data):
        return compress(data)
    encode = staticmethod(encode)
 class ASCIIHexDecode(object):
    def decode(data, decodeParms=None):
        retval = ""
        char = ""
        x = 0
        while True:
            c = data[x]
            if c == ">":
                break
            elif c.isspace():
                x += 1
                continue
            char += c
            if len(char) == 2:
                retval += chr(int(char, base=16))
                char = ""
            x += 1
        assert char == ""
        return retval
    decode = staticmethod(decode)
 class ASCII85Decode(object):
    def decode(data, decodeParms=None):
        retval = ""
        group = []
        x = 0
        hitEod = False
        # remove all whitespace from data
        data = [y for y in data if not (y in ' \n\r\t')]
        while not hitEod:
            c = data[x]
            if len(retval) == 0 and c == "<" and data[x+1] == "~":
                x += 2
                continue
            #elif c.isspace():
            #    x += 1
            #    continue
            elif c == 'z':
                assert len(group) == 0
                retval += '\x00\x00\x00\x00'
                continue
            elif c == "~" and data[x+1] == ">":
                if len(group) != 0:
                    # cannot have a final group of just 1 char
                    assert len(group) > 1
                    cnt = len(group) - 1
                    group += [ 85, 85, 85 ]
                    hitEod = cnt
                else:
                    break
            else:
                c = ord(c) - 33
                assert c >= 0 and c < 85
                group += [ c ]
            if len(group) >= 5:
                b = group[0] * (85**4) + \
                    group[1] * (85**3) + \
                    group[2] * (85**2) + \
                    group[3] * 85 + \
                    group[4]
                assert b < (2**32 - 1)
                c4 = chr((b >> 0) % 256)
                c3 = chr((b >> 8) % 256)
                c2 = chr((b >> 16) % 256)
                c1 = chr(b >> 24)
                retval += (c1 + c2 + c3 + c4)
                if hitEod:
                    retval = retval[:-4+hitEod]
                group = []
            x += 1
        return retval
    decode = staticmethod(decode)
 def decodeStreamData(stream):
    from generic import NameObject
    filters = stream.get("/Filter", ())
    if len(filters) and not isinstance(filters[0], NameObject):
        # we have a single filter instance
        filters = (filters,)
    data = stream._data
    for filterType in filters:
        if filterType == "/FlateDecode":
            data = FlateDecode.decode(data, stream.get("/DecodeParms"))
        elif filterType == "/ASCIIHexDecode":
            data = ASCIIHexDecode.decode(data)
        elif filterType == "/ASCII85Decode":
            data = ASCII85Decode.decode(data)
        elif filterType == "/Crypt":
            decodeParams = stream.get("/DecodeParams", {})
            if "/Name" not in decodeParams and "/Type" not in decodeParams:
                pass
            else:
                raise NotImplementedError("/Crypt filter with /Name or /Type not supported yet")
        else:
            # unsupported filter
            raise NotImplementedError("unsupported filter %s" % filterType)
    return data
 if __name__ == "__main__":
    assert "abc" == ASCIIHexDecode.decode('61\n626\n3>')
    ascii85Test = """
     <~9jqo^BlbD-BleB1DJ+*+F(f,q/0JhKF<GL>Cj@.4Gp$d7F!,L7@<6@)/0JDEF<G%<+EV:2F!,
     O<DJ+*.@<*K0@<6L(Df-\\0Ec5e;DffZ(EZee.Bl.9pF"AGXBPCsi+DGm>@3BB/F*&OCAfu2/AKY
     i(DIb:@FD,*)+C]U=@3BN#EcYf8ATD3s@q?d$AftVqCh[NqF<G:8+EV:.+Cf>-FD5W8ARlolDIa
     l(DId<j@<?3r@:F%a+D58'ATD4$Bl@l3De:,-DJs`8ARoFb/0JMK@qB4^F!,R<AKZ&-DfTqBG%G
     >uD.RTpAKYo'+CT/5+Cei#DII?(E,9)oF*2M7/c~>
    """
    ascii85_originalText="Man is distinguished, not only by his reason, but by this singular passion from other animals, which is a lust of the mind, that by a perseverance of delight in the continued and indefatigable generation of knowledge, exceeds the short vehemence of any carnal pleasure."
    assert ASCII85Decode.decode(ascii85Test) == ascii85_originalText
--- a/PyPDF2/generic.py
+++ b/PyPDF2/generic.py
--- a/PyPDF2/merger.py
+++ b/PyPDF2/merger.py
@ -0,0 +1,401 @@
 # vim: sw=4:expandtab:foldmethod=marker
 #
 # Copyright (c) 2006, Mathieu Fenniak
 # All rights reserved.
 #
 # Redistribution and use in source and binary forms, with or without
 # modification, are permitted provided that the following conditions are
 # met:
 #
 # * Redistributions of source code must retain the above copyright notice,
 # this list of conditions and the following disclaimer.
 # * Redistributions in binary form must reproduce the above copyright notice,
 # this list of conditions and the following disclaimer in the documentation
 # and/or other materials provided with the distribution.
 # * The name of the author may not be used to endorse or promote products
 # derived from this software without specific prior written permission.
 #
 # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
 # AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
 # IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
 # ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
 # LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
 # CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
 # SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
 # INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
 # CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
 # ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
 # POSSIBILITY OF SUCH DAMAGE.
 from generic import *
 from pdf import PdfFileReader, PdfFileWriter, Destination
 class _MergedPage(object):
    """
    _MergedPage is used internally by PdfFileMerger to collect necessary information on each page that is being merged.
    """
    def __init__(self, pagedata, src, id):
        self.src = src
        self.pagedata = pagedata
        self.out_pagedata = None
        self.id = id
 class PdfFileMerger(object):
    """
    PdfFileMerger merges multiple PDFs into a single PDF. It can concatenate, 
    slice, insert, or any combination of the above.
    See the functions "merge" (or "append") and "write" (or "overwrite") for
    usage information.
    """
    def __init__(self):
        """
        >>> PdfFileMerger()
        Initializes a PdfFileMerger, no parameters required
        """
        self.inputs = []
        self.pages = []
        self.output = PdfFileWriter()
        self.bookmarks = []
        self.named_dests = []
        self.id_count = 0
    def merge(self, position, fileobj, bookmark=None, pages=None, import_bookmarks=True):
        """
        >>> merge(position, file, bookmark=None, pages=None, import_bookmarks=True)
        Merges the pages from the source document specified by "file" into the output
        file at the page number specified by "position".
        Optionally, you may specify a bookmark to be applied at the beginning of the 
        included file by supplying the text of the bookmark in the "bookmark" parameter.
        You may prevent the source document's bookmarks from being imported by
        specifying "import_bookmarks" as False.
        You may also use the "pages" parameter to merge only the specified range of 
        pages from the source document into the output document.
        """
        my_file = False
        if type(fileobj) in (str, unicode):
            fileobj = file(fileobj, 'rb')
            my_file = True
        if type(fileobj) == PdfFileReader:
            pdfr = fileobj
            fileobj = pdfr.file
        else:
            pdfr = PdfFileReader(fileobj)
        # Find the range of pages to merge
        if pages == None:
            pages = (0, pdfr.getNumPages())
        elif type(pages) in (int, float, str, unicode):
            raise TypeError('"pages" must be a tuple of (start, end)')
        srcpages = []
        if bookmark:
            bookmark = Bookmark(TextStringObject(bookmark), NumberObject(self.id_count), NameObject('/Fit'))
        outline = []
        if import_bookmarks:
            outline = pdfr.getOutlines()
            outline = self._trim_outline(pdfr, outline, pages)
        if bookmark:
            self.bookmarks += [bookmark, outline]
        else:
            self.bookmarks += outline
        dests = pdfr.namedDestinations
        dests = self._trim_dests(pdfr, dests, pages)
        self.named_dests += dests
        # Gather all the pages that are going to be merged
        for i in range(*pages):
            pg = pdfr.getPage(i)
            id = self.id_count
            self.id_count += 1
            mp = _MergedPage(pg, pdfr, id)
            srcpages.append(mp)
        self._associate_dests_to_pages(srcpages)
        self._associate_bookmarks_to_pages(srcpages)
        # Slice to insert the pages at the specified position
        self.pages[position:position] = srcpages
        # Keep track of our input files so we can close them later
        self.inputs.append((fileobj, pdfr, my_file))
    def append(self, fileobj, bookmark=None, pages=None, import_bookmarks=True):
        """
        >>> append(file, bookmark=None, pages=None, import_bookmarks=True):
        Identical to the "merge" function, but assumes you want to concatenate all pages
        onto the end of the file instead of specifying a position.
        """
        self.merge(len(self.pages), fileobj, bookmark, pages, import_bookmarks)
    def write(self, fileobj):
        """
        >>> write(file)
        Writes all data that has been merged to "file" (which can be a filename or any
        kind of file-like object)
        """
        my_file = False
        if type(fileobj) in (str, unicode):
            fileobj = file(fileobj, 'wb')
            my_file = True
        # Add pages to the PdfFileWriter
        for page in self.pages:
            self.output.addPage(page.pagedata)
            page.out_pagedata = self.output.getReference(self.output._pages.getObject()["/Kids"][-1].getObject())
        # Once all pages are added, create bookmarks to point at those pages
        self._write_dests()
        self._write_bookmarks()
        # Write the output to the file   
        self.output.write(fileobj)
        if my_file:
            fileobj.close()
    def close(self):
        """
        >>> close()
        Shuts all file descriptors (input and output) and clears all memory usage
        """
        self.pages = []
        for fo, pdfr, mine in self.inputs:
            if mine:
                fo.close()
        self.inputs = []
        self.output = None
    def _trim_dests(self, pdf, dests, pages):
        """
        Removes any named destinations that are not a part of the specified page set
        """
        new_dests = []
        prev_header_added = True
        for k, o in dests.items():
            for j in range(*pages):
                if pdf.getPage(j).getObject() == o['/Page'].getObject():
                    o[NameObject('/Page')] = o['/Page'].getObject()
                    assert str(k) == str(o['/Title'])
                    new_dests.append(o)
                    break
        return new_dests
    def _trim_outline(self, pdf, outline, pages):
        """
        Removes any outline/bookmark entries that are not a part of the specified page set
        """
        new_outline = []
        prev_header_added = True
        for i, o in enumerate(outline):
            if type(o) == list:
                sub = self._trim_outline(pdf, o, pages)
                if sub:
                    if not prev_header_added:
                        new_outline.append(outline[i-1])
                    new_outline.append(sub)
            else:
                prev_header_added = False
                for j in range(*pages):
                    if pdf.getPage(j).getObject() == o['/Page'].getObject():
                        o[NameObject('/Page')] = o['/Page'].getObject()
                        new_outline.append(o)
                        prev_header_added = True
                        break
        return new_outline
    def _write_dests(self):
        dests = self.named_dests
        for v in dests:
            pageno = None
            pdf = None
            if v.has_key('/Page'):
                for i, p in enumerate(self.pages):
                    if p.id == v['/Page']:
                        v[NameObject('/Page')] = p.out_pagedata
                        pageno = i
                        pdf = p.src
            if pageno != None:
                self.output.addNamedDestinationObject(v)
    def _write_bookmarks(self, bookmarks=None, parent=None):
        if bookmarks == None:
            bookmarks = self.bookmarks
        last_added = None
        for b in bookmarks:
            if type(b) == list:
                self._write_bookmarks(b, last_added)
                continue
            pageno = None
            pdf = None
            if b.has_key('/Page'):
                for i, p in enumerate(self.pages):
                    if p.id == b['/Page']:
                        b[NameObject('/Page')] = p.out_pagedata
                        pageno = i
                        pdf = p.src
            if pageno != None:
                last_added = self.output.addBookmarkDestination(b, parent)
    def _associate_dests_to_pages(self, pages):
        for nd in self.named_dests:
            pageno = None
            np = nd['/Page']
            if type(np) == NumberObject:
                continue
            for p in pages:
                if np.getObject() == p.pagedata.getObject():
                    pageno = p.id
            if pageno != None:
                nd[NameObject('/Page')] = NumberObject(pageno)
            else:
                raise ValueError, "Unresolved named destination '%s'" % (nd['/Title'],)
    def _associate_bookmarks_to_pages(self, pages, bookmarks=None):
        if bookmarks == None:
            bookmarks = self.bookmarks
        for b in bookmarks:
            if type(b) == list:
                self._associate_bookmarks_to_pages(pages, b)
                continue
            pageno = None
            bp = b['/Page']
            if type(bp) == NumberObject:
                continue
            for p in pages:
                if bp.getObject() == p.pagedata.getObject():
                    pageno = p.id
            if pageno != None:
                b[NameObject('/Page')] = NumberObject(pageno)
            else:
                raise ValueError, "Unresolved bookmark '%s'" % (b['/Title'],)
    def findBookmark(self, bookmark, root=None):
    	if root == None:
    		root = self.bookmarks
    	for i, b in enumerate(root):
    		if type(b) == list:
    			res = self.findBookmark(bookmark, b)
    			if res:
    				return [i] + res
    		if b == bookmark or b['/Title'] == bookmark:
    			return [i]
    	return None
    def addBookmark(self, title, pagenum, parent=None):
        """
        Add a bookmark to the pdf, using the specified title and pointing at 
        the specified page number. A parent can be specified to make this a
        nested bookmark below the parent.
        """
        if parent == None:
        	iloc = [len(self.bookmarks)-1]
        elif type(parent) == list:
        	iloc = parent
        else:
        	iloc = self.findBookmark(parent)
        dest = Bookmark(TextStringObject(title), NumberObject(pagenum), NameObject('/FitH'), NumberObject(826))
        if parent == None:
        	self.bookmarks.append(dest)
        else:
        	bmparent = self.bookmarks
        	for i in iloc[:-1]:
        		bmparent = bmparent[i]
        	npos = iloc[-1]+1
        	if npos < len(bmparent) and type(bmparent[npos]) == list:
        		bmparent[npos].append(dest)
        	else:
        		bmparent.insert(npos, [dest])
    def addNamedDestination(self, title, pagenum):
        """
        Add a destination to the pdf, using the specified title and pointing
        at the specified page number.
        """
        dest = Destination(TextStringObject(title), NumberObject(pagenum), NameObject('/FitH'), NumberObject(826))
        self.named_dests.append(dest)
 class OutlinesObject(list):
    def __init__(self, pdf, tree, parent=None):
        list.__init__(self)
        self.tree = tree
        self.pdf = pdf
        self.parent = parent
    def remove(self, index):
        obj = self[index]
        del self[index]
        self.tree.removeChild(obj)
    def add(self, title, page):
        pageRef = self.pdf.getObject(self.pdf._pages)['/Kids'][pagenum]
        action = DictionaryObject()
        action.update({
            NameObject('/D') : ArrayObject([pageRef, NameObject('/FitH'), NumberObject(826)]),
            NameObject('/S') : NameObject('/GoTo')
        })
        actionRef = self.pdf._addObject(action)
        bookmark = TreeObject()
        bookmark.update({
            NameObject('/A') : actionRef,
            NameObject('/Title') : createStringObject(title),
        })
        pdf._addObject(bookmark)
        self.tree.addChild(bookmark)
    def removeAll(self):
        for child in [x for x in self.tree.children()]:
            self.tree.removeChild(child)
            self.pop()
--- a/PyPDF2/pdf.py
+++ b/PyPDF2/pdf.py
--- a/PyPDF2/utils.py
+++ b/PyPDF2/utils.py
@ -0,0 +1,125 @@
 # vim: sw=4:expandtab:foldmethod=marker
 #
 # Copyright (c) 2006, Mathieu Fenniak
 # All rights reserved.
 #
 # Redistribution and use in source and binary forms, with or without
 # modification, are permitted provided that the following conditions are
 # met:
 #
 # * Redistributions of source code must retain the above copyright notice,
 # this list of conditions and the following disclaimer.
 # * Redistributions in binary form must reproduce the above copyright notice,
 # this list of conditions and the following disclaimer in the documentation
 # and/or other materials provided with the distribution.
 # * The name of the author may not be used to endorse or promote products
 # derived from this software without specific prior written permission.
 #
 # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
 # AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
 # IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
 # ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
 # LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
 # CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
 # SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
 # INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
 # CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
 # ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
 # POSSIBILITY OF SUCH DAMAGE.
 """
 Utility functions for PDF library.
 """
 __author__ = "Mathieu Fenniak"
 __author_email__ = "biziqe@mathieu.fenniak.net"
 #ENABLE_PSYCO = False
 #if ENABLE_PSYCO:
 #    try:
 #        import psyco
 #    except ImportError:
 #        ENABLE_PSYCO = False
 #
 #if not ENABLE_PSYCO:
 #    class psyco:
 #        def proxy(func):
 #            return func
 #        proxy = staticmethod(proxy)
 def readUntilWhitespace(stream, maxchars=None):
    txt = ""
    while True:
        tok = stream.read(1)
        if tok.isspace() or not tok:
            break
        txt += tok
        if len(txt) == maxchars:
            break
    return txt
 def readNonWhitespace(stream):
    tok = ' '
    while tok == '\n' or tok == '\r' or tok == ' ' or tok == '\t':
        tok = stream.read(1)
    return tok
 class ConvertFunctionsToVirtualList(object):
    def __init__(self, lengthFunction, getFunction):
        self.lengthFunction = lengthFunction
        self.getFunction = getFunction
    def __len__(self):
        return self.lengthFunction()
    def __getitem__(self, index):
        if not isinstance(index, int):
            raise TypeError, "sequence indices must be integers"
        len_self = len(self)
        if index < 0:
            # support negative indexes
            index = len_self + index
        if index < 0 or index >= len_self:
            raise IndexError, "sequence index out of range"
        return self.getFunction(index)
 def RC4_encrypt(key, plaintext):
    S = [i for i in range(256)]
    j = 0
    for i in range(256):
        j = (j + S[i] + ord(key[i % len(key)])) % 256
        S[i], S[j] = S[j], S[i]
    i, j = 0, 0
    retval = ""
    for x in range(len(plaintext)):
        i = (i + 1) % 256
        j = (j + S[i]) % 256
        S[i], S[j] = S[j], S[i]
        t = S[(S[i] + S[j]) % 256]
        retval += chr(ord(plaintext[x]) ^ t)
    return retval
 def matrixMultiply(a, b):
    return [[sum([float(i)*float(j)
                  for i, j in zip(row, col)]
                ) for col in zip(*b)]
            for row in a]
 class PyPdfError(Exception):
    pass
 class PdfReadError(PyPdfError):
    pass
 class PageSizeNotDefinedError(PyPdfError):
    pass
 class PdfReadWarning(UserWarning):
    pass
 if __name__ == "__main__":
    # test RC4
    out = RC4_encrypt("Key", "Plaintext")
    print repr(out)
    pt = RC4_encrypt("Key", out)
    print repr(pt)
--- a/PyPDF2/xmp.py
+++ b/PyPDF2/xmp.py
@ -0,0 +1,355 @@
 import re
 import datetime
 import decimal
 from generic import PdfObject
 from xml.dom import getDOMImplementation
 from xml.dom.minidom import parseString
 RDF_NAMESPACE = "http://www.w3.org/1999/02/22-rdf-syntax-ns#"
 DC_NAMESPACE = "http://purl.org/dc/elements/1.1/"
 XMP_NAMESPACE = "http://ns.adobe.com/xap/1.0/"
 PDF_NAMESPACE = "http://ns.adobe.com/pdf/1.3/"
 XMPMM_NAMESPACE = "http://ns.adobe.com/xap/1.0/mm/"
 # What is the PDFX namespace, you might ask?  I might ask that too.  It's
 # a completely undocumented namespace used to place "custom metadata"
 # properties, which are arbitrary metadata properties with no semantic or
 # documented meaning.  Elements in the namespace are key/value-style storage,
 # where the element name is the key and the content is the value.  The keys
 # are transformed into valid XML identifiers by substituting an invalid
 # identifier character with \u2182 followed by the unicode hex ID of the
 # original character.  A key like "my car" is therefore "my\u21820020car".
 #
 # \u2182, in case you're wondering, is the unicode character
 # \u{ROMAN NUMERAL TEN THOUSAND}, a straightforward and obvious choice for
 # escaping characters.
 #
 # Intentional users of the pdfx namespace should be shot on sight.  A
 # custom data schema and sensical XML elements could be used instead, as is
 # suggested by Adobe's own documentation on XMP (under "Extensibility of
 # Schemas").
 #
 # Information presented here on the /pdfx/ schema is a result of limited
 # reverse engineering, and does not constitute a full specification.
 PDFX_NAMESPACE = "http://ns.adobe.com/pdfx/1.3/"
 iso8601 = re.compile("""
        (?P<year>[0-9]{4})
        (-
            (?P<month>[0-9]{2})
            (-
                (?P<day>[0-9]+)
                (T
                    (?P<hour>[0-9]{2}):
                    (?P<minute>[0-9]{2})
                    (:(?P<second>[0-9]{2}(.[0-9]+)?))?
                    (?P<tzd>Z|[-+][0-9]{2}:[0-9]{2})
                )?
            )?
        )?
        """, re.VERBOSE)
 ##
 # An object that represents Adobe XMP metadata.
 class XmpInformation(PdfObject):
    def __init__(self, stream):
        self.stream = stream
        docRoot = parseString(self.stream.getData())
        self.rdfRoot = docRoot.getElementsByTagNameNS(RDF_NAMESPACE, "RDF")[0]
        self.cache = {}
    def writeToStream(self, stream, encryption_key):
        self.stream.writeToStream(stream, encryption_key)
    def getElement(self, aboutUri, namespace, name):
        for desc in self.rdfRoot.getElementsByTagNameNS(RDF_NAMESPACE, "Description"):
            if desc.getAttributeNS(RDF_NAMESPACE, "about") == aboutUri:
                attr = desc.getAttributeNodeNS(namespace, name)
                if attr != None:
                    yield attr
                for element in desc.getElementsByTagNameNS(namespace, name):
                    yield element
    def getNodesInNamespace(self, aboutUri, namespace):
        for desc in self.rdfRoot.getElementsByTagNameNS(RDF_NAMESPACE, "Description"):
            if desc.getAttributeNS(RDF_NAMESPACE, "about") == aboutUri:
                for i in range(desc.attributes.length):
                    attr = desc.attributes.item(i)
                    if attr.namespaceURI == namespace:
                        yield attr
                for child in desc.childNodes:
                    if child.namespaceURI == namespace:
                        yield child
    def _getText(self, element):
        text = ""
        for child in element.childNodes:
            if child.nodeType == child.TEXT_NODE:
                text += child.data
        return text
    def _converter_string(value):
        return value
    def _converter_date(value):
        m = iso8601.match(value)
        year = int(m.group("year"))
        month = int(m.group("month") or "1")
        day = int(m.group("day") or "1")
        hour = int(m.group("hour") or "0")
        minute = int(m.group("minute") or "0")
        second = decimal.Decimal(m.group("second") or "0")
        seconds = second.to_integral(decimal.ROUND_FLOOR)
        milliseconds = (second - seconds) * 1000000
        tzd = m.group("tzd") or "Z"
        dt = datetime.datetime(year, month, day, hour, minute, seconds, milliseconds)
        if tzd != "Z":
            tzd_hours, tzd_minutes = [int(x) for x in tzd.split(":")]
            tzd_hours *= -1
            if tzd_hours < 0:
                tzd_minutes *= -1
            dt = dt + datetime.timedelta(hours=tzd_hours, minutes=tzd_minutes)
        return dt
    _test_converter_date = staticmethod(_converter_date)
    def _getter_bag(namespace, name, converter):
        def get(self):
            cached = self.cache.get(namespace, {}).get(name)
            if cached:
                return cached
            retval = []
            for element in self.getElement("", namespace, name):
                bags = element.getElementsByTagNameNS(RDF_NAMESPACE, "Bag")
                if len(bags):
                    for bag in bags:
                        for item in bag.getElementsByTagNameNS(RDF_NAMESPACE, "li"):
                            value = self._getText(item)
                            value = converter(value)
                            retval.append(value)
            ns_cache = self.cache.setdefault(namespace, {})
            ns_cache[name] = retval
            return retval
        return get
    def _getter_seq(namespace, name, converter):
        def get(self):
            cached = self.cache.get(namespace, {}).get(name)
            if cached:
                return cached
            retval = []
            for element in self.getElement("", namespace, name):
                seqs = element.getElementsByTagNameNS(RDF_NAMESPACE, "Seq")
                if len(seqs):
                    for seq in seqs:
                        for item in seq.getElementsByTagNameNS(RDF_NAMESPACE, "li"):
                            value = self._getText(item)
                            value = converter(value)
                            retval.append(value)
                else:
                    value = converter(self._getText(element))
                    retval.append(value)
            ns_cache = self.cache.setdefault(namespace, {})
            ns_cache[name] = retval
            return retval
        return get
    def _getter_langalt(namespace, name, converter):
        def get(self):
            cached = self.cache.get(namespace, {}).get(name)
            if cached:
                return cached
            retval = {}
            for element in self.getElement("", namespace, name):
                alts = element.getElementsByTagNameNS(RDF_NAMESPACE, "Alt")
                if len(alts):
                    for alt in alts:
                        for item in alt.getElementsByTagNameNS(RDF_NAMESPACE, "li"):
                            value = self._getText(item)
                            value = converter(value)
                            retval[item.getAttribute("xml:lang")] = value
                else:
                    retval["x-default"] = converter(self._getText(element))
            ns_cache = self.cache.setdefault(namespace, {})
            ns_cache[name] = retval
            return retval
        return get
    def _getter_single(namespace, name, converter):
        def get(self):
            cached = self.cache.get(namespace, {}).get(name)
            if cached:
                return cached
            value = None
            for element in self.getElement("", namespace, name):
                if element.nodeType == element.ATTRIBUTE_NODE:
                    value = element.nodeValue
                else:
                    value = self._getText(element)
                break
            if value != None:
                value = converter(value)
            ns_cache = self.cache.setdefault(namespace, {})
            ns_cache[name] = value
            return value
        return get
    ##
    # Contributors to the resource (other than the authors).  An unsorted
    # array of names.
    # <p>Stability: Added in v1.12, will exist for all future v1.x releases.
    dc_contributor = property(_getter_bag(DC_NAMESPACE, "contributor", _converter_string))
    ##
    # Text describing the extent or scope of the resource.
    # <p>Stability: Added in v1.12, will exist for all future v1.x releases.
    dc_coverage = property(_getter_single(DC_NAMESPACE, "coverage", _converter_string))
    ##
    # A sorted array of names of the authors of the resource, listed in order
    # of precedence.
    # <p>Stability: Added in v1.12, will exist for all future v1.x releases.
    dc_creator = property(_getter_seq(DC_NAMESPACE, "creator", _converter_string))
    ##
    # A sorted array of dates (datetime.datetime instances) of signifigance to
    # the resource.  The dates and times are in UTC.
    # <p>Stability: Added in v1.12, will exist for all future v1.x releases.
    dc_date = property(_getter_seq(DC_NAMESPACE, "date", _converter_date))
    ##
    # A language-keyed dictionary of textual descriptions of the content of the
    # resource.
    # <p>Stability: Added in v1.12, will exist for all future v1.x releases.
    dc_description = property(_getter_langalt(DC_NAMESPACE, "description", _converter_string))
    ##
    # The mime-type of the resource.
    # <p>Stability: Added in v1.12, will exist for all future v1.x releases.
    dc_format = property(_getter_single(DC_NAMESPACE, "format", _converter_string))
    ##
    # Unique identifier of the resource.
    # <p>Stability: Added in v1.12, will exist for all future v1.x releases.
    dc_identifier = property(_getter_single(DC_NAMESPACE, "identifier", _converter_string))
    ##
    # An unordered array specifying the languages used in the resource.
    # <p>Stability: Added in v1.12, will exist for all future v1.x releases.
    dc_language = property(_getter_bag(DC_NAMESPACE, "language", _converter_string))
    ##
    # An unordered array of publisher names.
    # <p>Stability: Added in v1.12, will exist for all future v1.x releases.
    dc_publisher = property(_getter_bag(DC_NAMESPACE, "publisher", _converter_string))
    ##
    # An unordered array of text descriptions of relationships to other
    # documents.
    # <p>Stability: Added in v1.12, will exist for all future v1.x releases.
    dc_relation = property(_getter_bag(DC_NAMESPACE, "relation", _converter_string))
    ##
    # A language-keyed dictionary of textual descriptions of the rights the
    # user has to this resource.
    # <p>Stability: Added in v1.12, will exist for all future v1.x releases.
    dc_rights = property(_getter_langalt(DC_NAMESPACE, "rights", _converter_string))
    ##
    # Unique identifier of the work from which this resource was derived.
    # <p>Stability: Added in v1.12, will exist for all future v1.x releases.
    dc_source = property(_getter_single(DC_NAMESPACE, "source", _converter_string))
    ##
    # An unordered array of descriptive phrases or keywrods that specify the
    # topic of the content of the resource.
    # <p>Stability: Added in v1.12, will exist for all future v1.x releases.
    dc_subject = property(_getter_bag(DC_NAMESPACE, "subject", _converter_string))
    ##
    # A language-keyed dictionary of the title of the resource.
    # <p>Stability: Added in v1.12, will exist for all future v1.x releases.
    dc_title = property(_getter_langalt(DC_NAMESPACE, "title", _converter_string))
    ##
    # An unordered array of textual descriptions of the document type.
    # <p>Stability: Added in v1.12, will exist for all future v1.x releases.
    dc_type = property(_getter_bag(DC_NAMESPACE, "type", _converter_string))
    ##
    # An unformatted text string representing document keywords.
    # <p>Stability: Added in v1.12, will exist for all future v1.x releases.
    pdf_keywords = property(_getter_single(PDF_NAMESPACE, "Keywords", _converter_string))
    ##
    # The PDF file version, for example 1.0, 1.3.
    # <p>Stability: Added in v1.12, will exist for all future v1.x releases.
    pdf_pdfversion = property(_getter_single(PDF_NAMESPACE, "PDFVersion", _converter_string))
    ##
    # The name of the tool that created the PDF document.
    # <p>Stability: Added in v1.12, will exist for all future v1.x releases.
    pdf_producer = property(_getter_single(PDF_NAMESPACE, "Producer", _converter_string))
    ##
    # The date and time the resource was originally created.  The date and
    # time are returned as a UTC datetime.datetime object.
    # <p>Stability: Added in v1.12, will exist for all future v1.x releases.
    xmp_createDate = property(_getter_single(XMP_NAMESPACE, "CreateDate", _converter_date))
    ##
    # The date and time the resource was last modified.  The date and time
    # are returned as a UTC datetime.datetime object.
    # <p>Stability: Added in v1.12, will exist for all future v1.x releases.
    xmp_modifyDate = property(_getter_single(XMP_NAMESPACE, "ModifyDate", _converter_date))
    ##
    # The date and time that any metadata for this resource was last
    # changed.  The date and time are returned as a UTC datetime.datetime
    # object.
    # <p>Stability: Added in v1.12, will exist for all future v1.x releases.
    xmp_metadataDate = property(_getter_single(XMP_NAMESPACE, "MetadataDate", _converter_date))
    ##
    # The name of the first known tool used to create the resource.
    # <p>Stability: Added in v1.12, will exist for all future v1.x releases.
    xmp_creatorTool = property(_getter_single(XMP_NAMESPACE, "CreatorTool", _converter_string))
    ##
    # The common identifier for all versions and renditions of this resource.
    # <p>Stability: Added in v1.12, will exist for all future v1.x releases.
    xmpmm_documentId = property(_getter_single(XMPMM_NAMESPACE, "DocumentID", _converter_string))
    ##
    # An identifier for a specific incarnation of a document, updated each
    # time a file is saved.
    # <p>Stability: Added in v1.12, will exist for all future v1.x releases.
    xmpmm_instanceId = property(_getter_single(XMPMM_NAMESPACE, "InstanceID", _converter_string))
    def custom_properties(self):
        if not hasattr(self, "_custom_properties"):
            self._custom_properties = {}
            for node in self.getNodesInNamespace("", PDFX_NAMESPACE):
                key = node.localName
                while True:
                    # see documentation about PDFX_NAMESPACE earlier in file
                    idx = key.find(u"\u2182")
                    if idx == -1:
                        break
                    key = key[:idx] + chr(int(key[idx+1:idx+5], base=16)) + key[idx+5:]
                if node.nodeType == node.ATTRIBUTE_NODE:
                    value = node.nodeValue
                else:
                    value = self._getText(node)
                self._custom_properties[key] = value
        return self._custom_properties
    ##
    # Retrieves custom metadata properties defined in the undocumented pdfx
    # metadata schema.
    # <p>Stability: Added in v1.12, will exist for all future v1.x releases.
    # @return Returns a dictionary of key/value items for custom metadata
    # properties.
    custom_properties = property(custom_properties)
--- a/38
+++ b/38
@ -0,0 +1,38 @@
 Example:
    from pyPdf import PdfFileWriter, PdfFileReader
    output = PdfFileWriter()
    input1 = PdfFileReader(file("document1.pdf", "rb"))
    # add page 1 from input1 to output document, unchanged
    output.addPage(input1.getPage(0))
    # add page 2 from input1, but rotated clockwise 90 degrees
    output.addPage(input1.getPage(1).rotateClockwise(90))
    # add page 3 from input1, rotated the other way:
    output.addPage(input1.getPage(2).rotateCounterClockwise(90))
    # alt: output.addPage(input1.getPage(2).rotateClockwise(270))
    # add page 4 from input1, but first add a watermark from another pdf:
    page4 = input1.getPage(3)
    watermark = PdfFileReader(file("watermark.pdf", "rb"))
    page4.mergePage(watermark.getPage(0))
    # add page 5 from input1, but crop it to half size:
    page5 = input1.getPage(4)
    page5.mediaBox.upperRight = (
        page5.mediaBox.getUpperRight_x() / 2,
        page5.mediaBox.getUpperRight_y() / 2
    )
    output.addPage(page5)
    # print how many pages input1 has:
    print "document1.pdf has %s pages." % input1.getNumPages())
    # finally, write "output" to document-output.pdf
    outputStream = file("document-output.pdf", "wb")
    output.write(outputStream)
--- a/setup.py
+++ b/setup.py
@ -0,0 +1,40 @@
 #!/usr/bin/env python
 from distutils.core import setup
 long_description = """
 A Pure-Python library built as a PDF toolkit.  It is capable of:
 - extracting document information (title, author, ...),
 - splitting documents page by page,
 - merging documents page by page,
 - cropping pages,
 - merging multiple pages into a single page,
 - encrypting and decrypting PDF files.
 By being Pure-Python, it should run on any Python platform without any
 dependencies on external libraries.  It can also work entirely on StringIO
 objects rather than file streams, allowing for PDF manipulation in memory.
 It is therefore a useful tool for websites that manage or manipulate PDFs.
 """
 setup(
        name="pyPdf",
        version="1.12",
        description="PDF toolkit",
        long_description=long_description,
        author="Mathieu Fenniak",
        author_email="biziqe@mathieu.fenniak.net",
        url="http://pybrary.net/pyPdf/",
        download_url="http://pybrary.net/pyPdf/pyPdf-1.12.tar.gz",
        classifiers = [
            "Development Status :: 5 - Production/Stable",
            "Intended Audience :: Developers",
            "License :: OSI Approved :: BSD License",
            "Programming Language :: Python",
            "Operating System :: OS Independent",
            "Topic :: Software Development :: Libraries :: Python Modules",
            ],
        packages=["pyPdf"],
    )