First commit

Original PyPDF code. Updates should be coming from Noah soon.
2011-12-30 09:04:56 -06:00 · 2011-12-30 09:04:56 -06:00 · c59a212a4c
commit c59a212a4c
13 changed files with 4511 additions and 0 deletions
--- a/.gitignore
+++ b/.gitignore
@ -0,0 +1,2 @@
+*.pyc
+*.swp
--- a/205
+++ b/205
@ -0,0 +1,205 @@
+Version 1.12, 2008-09-02
+------------------------
+
+ - Added support for XMP metadata.
+
+ - Fix reading files with xref streams with multiple /Index values.
+
+ - Fix extracting content streams that use graphics operators longer than 2
+   characters.  Affects merging PDF files.
+
+
+Version 1.11, 2008-05-09
+------------------------
+
+ - Patch from Hartmut Goebel to permit RectangleObjects to accept NumberObject
+   or FloatObject values.
+
+ - PDF compatibility fixes.
+
+ - Fix to read object xref stream in correct order.
+
+ - Fix for comments inside content streams.
+
+
+Version 1.10, 2007-10-04
+------------------------
+
+ - Text strings from PDF files are returned as Unicode string objects when
+ pyPdf determines that they can be decoded (as UTF-16 strings, or as
+ PDFDocEncoding strings).  Unicode objects are also written out when
+ necessary.  This means that string objects in pyPdf can be either
+ generic.ByteStringObject instances, or generic.TextStringObject instances.
+
+ - The extractText method now returns a unicode string object.
+
+ - All document information properties now return unicode string objects.  In
+ the event that a document provides docinfo properties that are not decoded by
+ pyPdf, the raw byte strings can be accessed with an "_raw" property (ie.
+ title_raw rather than title)
+
+ - generic.DictionaryObject instances have been enhanced to be easier to use.
+ Values coming out of dictionary objects will automatically be de-referenced
+ (.getObject will be called on them), unless accessed by the new "raw_get"
+ method.  DictionaryObjects can now only contain PdfObject instances (as keys
+ and values), making it easier to debug where non-PdfObject values (which
+ cannot be written out) are entering dictionaries.
+
+ - Support for reading named destinations and outlines in PDF files.  Original
+ patch by Ashish Kulkarni.
+
+ - Stream compatibility reading enhancements for malformed PDF files.
+
+ - Cross reference table reading enhancements for malformed PDF files.
+
+ - Encryption documentation.
+
+ - Replace some "assert" statements with error raising.
+
+ - Minor optimizations to FlateDecode algorithm increase speed when using PNG
+ predictors.
+
+Version 1.9, 2006-12-15
+-----------------------
+
+ - Fix several serious bugs introduced in version 1.8, caused by a failure to
+   run through our PDF test suite before releasing that version.
+
+ - Fix bug in NullObject reading and writing.
+
+Version 1.8, 2006-12-14
+-----------------------
+
+ - Add support for decryption with the standard PDF security handler.  This
+   allows for decrypting PDF files given the proper user or owner password.
+
+ - Add support for encryption with the standard PDF security handler.
+
+ - Add new pythondoc documentation.
+
+ - Fix bug in ASCII85 decode that occurs when whitespace exists inside the
+   two terminating characters of the stream.
+
+Version 1.7, 2006-12-10
+-----------------------
+
+ - Fix a bug when using a single page object in two PdfFileWriter objects.
+
+ - Adjust PyPDF to be tolerant of whitespace characters that don't belong
+   during a stream object.
+
+ - Add documentInfo property to PdfFileReader.
+
+ - Add numPages property to PdfFileReader.
+
+ - Add pages property to PdfFileReader.
+
+ - Add extractText function to PdfFileReader.
+
+
+Version 1.6, 2006-06-06
+-----------------------
+
+ - Add basic support for comments in PDF files.  This allows us to read some
+   ReportLab PDFs that could not be read before.
+
+ - Add "auto-repair" for finding xref table at slightly bad locations.
+
+ - New StreamObject backend, cleaner and more powerful.  Allows the use of
+   stream filters more easily, including compressed streams.
+
+ - Add a graphics state push/pop around page merges.  Improves quality of
+   page merges when one page's content stream leaves the graphics 
+   in an abnormal state.
+
+ - Add PageObject.compressContentStreams function, which filters all content
+   streams and compresses them.  This will reduce the size of PDF pages,
+   especially after they could have been decompressed in a mergePage
+   operation.
+
+ - Support inline images in PDF content streams.
+
+ - Add support for using .NET framework compression when zlib is not
+   available.  This does not make pyPdf compatible with IronPython, but it
+   is a first step.
+
+ - Add support for reading the document information dictionary, and extracting
+   title, author, subject, producer and creator tags.
+
+ - Add patch to support NullObject and multiple xref streams, from Bradley
+   Lawrence.
+
+
+Version 1.5, 2006-01-28
+-----------------------
+
+- Fix a bug where merging pages did not work in "no-rename" cases when the
+  second page has an array of content streams.
+
+- Remove some debugging output that should not have been present.
+
+
+Version 1.4, 2006-01-27
+-----------------------
+
+- Add capability to merge pages from multiple PDF files into a single page
+  using the PageObject.mergePage function.  See example code (README or web
+  site) for more information.
+
+- Add ability to modify a page's MediaBox, CropBox, BleedBox, TrimBox, and
+  ArtBox properties through PageObject.  See example code (README or web site)
+  for more information.
+
+- Refactor pdf.py into multiple files: generic.py (contains objects like
+  NameObject, DictionaryObject), filters.py (contains filter code),
+  utils.py (various).  This does not affect importing PdfFileReader
+  or PdfFileWriter.
+
+- Add new decoding functions for standard PDF filters ASCIIHexDecode and
+  ASCII85Decode.
+
+- Change url and download_url to refer to new pybrary.net web site.
+
+
+Version 1.3, 2006-01-23
+-----------------------
+
+- Fix new bug introduced in 1.2 where PDF files with \r line endings did not
+  work properly anymore.  A new test suite developed with various PDF files
+  should prevent regression bugs from now on.
+
+- Fix a bug where inheriting attributes from page nodes did not work.
+
+
+Version 1.2, 2006-01-23
+-----------------------
+
+- Improved support for files with CRLF-based line endings, fixing a common
+  reported problem stating "assertion error: assert line == "%%EOF"".
+
+- Software author/maintainer is now officially a proud married person, which
+  is sure to result in better software... somehow.
+
+
+Version 1.1, 2006-01-18
+-----------------------
+
+- Add capability to rotate pages.
+
+- Improved PDF reading support to properly manage inherited attributes from
+  /Type=/Pages nodes.  This means that page groups that are rotated or have
+  different media boxes or whatever will now work properly.
+
+- Added PDF 1.5 support.  Namely cross-reference streams and object streams.
+  This release can mangle Adobe's PDFReference16.pdf successfully.
+
+
+Version 1.0, 2006-01-17
+-----------------------
+
+- First distutils-capable true public release.  Supports a wide variety of PDF
+  files that I found sitting around on my system.
+
+- Does not support some PDF 1.5 features, such as object streams,
+  cross-reference streams.
+
--- a/28
+++ b/28
@ -0,0 +1,28 @@
+Copyright (c) 2006-2008, Mathieu Fenniak
+Some contributions copyright (c) 2007, Ashish Kulkarni <kulkarni.ashish@gmail.com>
+
+All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are
+met:
+
+* Redistributions of source code must retain the above copyright notice,
+this list of conditions and the following disclaimer.
+* Redistributions in binary form must reproduce the above copyright notice,
+this list of conditions and the following disclaimer in the documentation
+and/or other materials provided with the distribution.
+* The name of the author may not be used to endorse or promote products
+derived from this software without specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
+LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+POSSIBILITY OF SUCH DAMAGE.
--- a/MANIFEST.in
+++ b/MANIFEST.in
@ -0,0 +1 @@
+include CHANGELOG
--- a/PyPDF2/init.py
+++ b/PyPDF2/init.py
@ -0,0 +1,4 @@
+from pdf import PdfFileReader, PdfFileWriter
+from merger import PdfFileMerger
+
+__all__ = ["pdf", "PdfFileMerger"]
--- a/PyPDF2/filters.py
+++ b/PyPDF2/filters.py
@ -0,0 +1,252 @@
+# vim: sw=4:expandtab:foldmethod=marker
+#
+# Copyright (c) 2006, Mathieu Fenniak
+# All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are
+# met:
+#
+# * Redistributions of source code must retain the above copyright notice,
+# this list of conditions and the following disclaimer.
+# * Redistributions in binary form must reproduce the above copyright notice,
+# this list of conditions and the following disclaimer in the documentation
+# and/or other materials provided with the distribution.
+# * The name of the author may not be used to endorse or promote products
+# derived from this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+# ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
+# LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+# CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+# SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+# INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+# CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+# ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+# POSSIBILITY OF SUCH DAMAGE.
+
+
+"""
+Implementation of stream filters for PDF.
+"""
+__author__ = "Mathieu Fenniak"
+__author_email__ = "biziqe@mathieu.fenniak.net"
+
+from utils import PdfReadError
+try:
+    from cStringIO import StringIO
+except ImportError:
+    from StringIO import StringIO
+
+try:
+    import zlib
+    def decompress(data):
+        return zlib.decompress(data)
+    def compress(data):
+        return zlib.compress(data)
+except ImportError:
+    # Unable to import zlib.  Attempt to use the System.IO.Compression
+    # library from the .NET framework. (IronPython only)
+    import System
+    from System import IO, Collections, Array
+    def _string_to_bytearr(buf):
+        retval = Array.CreateInstance(System.Byte, len(buf))
+        for i in range(len(buf)):
+            retval[i] = ord(buf[i])
+        return retval
+    def _bytearr_to_string(bytes):
+        retval = ""
+        for i in range(bytes.Length):
+            retval += chr(bytes[i])
+        return retval
+    def _read_bytes(stream):
+        ms = IO.MemoryStream()
+        buf = Array.CreateInstance(System.Byte, 2048)
+        while True:
+            bytes = stream.Read(buf, 0, buf.Length)
+            if bytes == 0:
+                break
+            else:
+                ms.Write(buf, 0, bytes)
+        retval = ms.ToArray()
+        ms.Close()
+        return retval
+    def decompress(data):
+        bytes = _string_to_bytearr(data)
+        ms = IO.MemoryStream()
+        ms.Write(bytes, 0, bytes.Length)
+        ms.Position = 0  # fseek 0
+        gz = IO.Compression.DeflateStream(ms, IO.Compression.CompressionMode.Decompress)
+        bytes = _read_bytes(gz)
+        retval = _bytearr_to_string(bytes)
+        gz.Close()
+        return retval
+    def compress(data):
+        bytes = _string_to_bytearr(data)
+        ms = IO.MemoryStream()
+        gz = IO.Compression.DeflateStream(ms, IO.Compression.CompressionMode.Compress, True)
+        gz.Write(bytes, 0, bytes.Length)
+        gz.Close()
+        ms.Position = 0 # fseek 0
+        bytes = ms.ToArray()
+        retval = _bytearr_to_string(bytes)
+        ms.Close()
+        return retval
+
+
+class FlateDecode(object):
+    def decode(data, decodeParms):
+        data = decompress(data)
+        predictor = 1
+        if decodeParms:
+            predictor = decodeParms.get("/Predictor", 1)
+        # predictor 1 == no predictor
+        if predictor != 1:
+            columns = decodeParms["/Columns"]
+            # PNG prediction:
+            if predictor >= 10 and predictor <= 15:
+                output = StringIO()
+                # PNG prediction can vary from row to row
+                rowlength = columns + 1
+                assert len(data) % rowlength == 0
+                prev_rowdata = (0,) * rowlength
+                for row in xrange(len(data) / rowlength):
+                    rowdata = [ord(x) for x in data[(row*rowlength):((row+1)*rowlength)]]
+                    filterByte = rowdata[0]
+                    if filterByte == 0:
+                        pass
+                    elif filterByte == 1:
+                        for i in range(2, rowlength):
+                            rowdata[i] = (rowdata[i] + rowdata[i-1]) % 256
+                    elif filterByte == 2:
+                        for i in range(1, rowlength):
+                            rowdata[i] = (rowdata[i] + prev_rowdata[i]) % 256
+                    else:
+                        # unsupported PNG filter
+                        raise PdfReadError("Unsupported PNG filter %r" % filterByte)
+                    prev_rowdata = rowdata
+                    output.write(''.join([chr(x) for x in rowdata[1:]]))
+                data = output.getvalue()
+            else:
+                # unsupported predictor
+                raise PdfReadError("Unsupported flatedecode predictor %r" % predictor)
+        return data
+    decode = staticmethod(decode)
+
+    def encode(data):
+        return compress(data)
+    encode = staticmethod(encode)
+
+class ASCIIHexDecode(object):
+    def decode(data, decodeParms=None):
+        retval = ""
+        char = ""
+        x = 0
+        while True:
+            c = data[x]
+            if c == ">":
+                break
+            elif c.isspace():
+                x += 1
+                continue
+            char += c
+            if len(char) == 2:
+                retval += chr(int(char, base=16))
+                char = ""
+            x += 1
+        assert char == ""
+        return retval
+    decode = staticmethod(decode)
+
+class ASCII85Decode(object):
+    def decode(data, decodeParms=None):
+        retval = ""
+        group = []
+        x = 0
+        hitEod = False
+        # remove all whitespace from data
+        data = [y for y in data if not (y in ' \n\r\t')]
+        while not hitEod:
+            c = data[x]
+            if len(retval) == 0 and c == "<" and data[x+1] == "~":
+                x += 2
+                continue
+            #elif c.isspace():
+            #    x += 1
+            #    continue
+            elif c == 'z':
+                assert len(group) == 0
+                retval += '\x00\x00\x00\x00'
+                continue
+            elif c == "~" and data[x+1] == ">":
+                if len(group) != 0:
+                    # cannot have a final group of just 1 char
+                    assert len(group) > 1
+                    cnt = len(group) - 1
+                    group += [ 85, 85, 85 ]
+                    hitEod = cnt
+                else:
+                    break
+            else:
+                c = ord(c) - 33
+                assert c >= 0 and c < 85
+                group += [ c ]
+            if len(group) >= 5:
+                b = group[0] * (85**4) + \
+                    group[1] * (85**3) + \
+                    group[2] * (85**2) + \
+                    group[3] * 85 + \
+                    group[4]
+                assert b < (2**32 - 1)
+                c4 = chr((b >> 0) % 256)
+                c3 = chr((b >> 8) % 256)
+                c2 = chr((b >> 16) % 256)
+                c1 = chr(b >> 24)
+                retval += (c1 + c2 + c3 + c4)
+                if hitEod:
+                    retval = retval[:-4+hitEod]
+                group = []
+            x += 1
+        return retval
+    decode = staticmethod(decode)
+
+def decodeStreamData(stream):
+    from generic import NameObject
+    filters = stream.get("/Filter", ())
+    if len(filters) and not isinstance(filters[0], NameObject):
+        # we have a single filter instance
+        filters = (filters,)
+    data = stream._data
+    for filterType in filters:
+        if filterType == "/FlateDecode":
+            data = FlateDecode.decode(data, stream.get("/DecodeParms"))
+        elif filterType == "/ASCIIHexDecode":
+            data = ASCIIHexDecode.decode(data)
+        elif filterType == "/ASCII85Decode":
+            data = ASCII85Decode.decode(data)
+        elif filterType == "/Crypt":
+            decodeParams = stream.get("/DecodeParams", {})
+            if "/Name" not in decodeParams and "/Type" not in decodeParams:
+                pass
+            else:
+                raise NotImplementedError("/Crypt filter with /Name or /Type not supported yet")
+        else:
+            # unsupported filter
+            raise NotImplementedError("unsupported filter %s" % filterType)
+    return data
+
+if __name__ == "__main__":
+    assert "abc" == ASCIIHexDecode.decode('61\n626\n3>')
+
+    ascii85Test = """
+     <~9jqo^BlbD-BleB1DJ+*+F(f,q/0JhKF<GL>Cj@.4Gp$d7F!,L7@<6@)/0JDEF<G%<+EV:2F!,
+     O<DJ+*.@<*K0@<6L(Df-\\0Ec5e;DffZ(EZee.Bl.9pF"AGXBPCsi+DGm>@3BB/F*&OCAfu2/AKY
+     i(DIb:@FD,*)+C]U=@3BN#EcYf8ATD3s@q?d$AftVqCh[NqF<G:8+EV:.+Cf>-FD5W8ARlolDIa
+     l(DId<j@<?3r@:F%a+D58'ATD4$Bl@l3De:,-DJs`8ARoFb/0JMK@qB4^F!,R<AKZ&-DfTqBG%G
+     >uD.RTpAKYo'+CT/5+Cei#DII?(E,9)oF*2M7/c~>
+    """
+    ascii85_originalText="Man is distinguished, not only by his reason, but by this singular passion from other animals, which is a lust of the mind, that by a perseverance of delight in the continued and indefatigable generation of knowledge, exceeds the short vehemence of any carnal pleasure."
+    assert ASCII85Decode.decode(ascii85Test) == ascii85_originalText
+
--- a/PyPDF2/generic.py
+++ b/PyPDF2/generic.py
--- a/PyPDF2/merger.py
+++ b/PyPDF2/merger.py
@ -0,0 +1,401 @@
+# vim: sw=4:expandtab:foldmethod=marker
+#
+# Copyright (c) 2006, Mathieu Fenniak
+# All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are
+# met:
+#
+# * Redistributions of source code must retain the above copyright notice,
+# this list of conditions and the following disclaimer.
+# * Redistributions in binary form must reproduce the above copyright notice,
+# this list of conditions and the following disclaimer in the documentation
+# and/or other materials provided with the distribution.
+# * The name of the author may not be used to endorse or promote products
+# derived from this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+# ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
+# LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+# CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+# SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+# INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+# CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+# ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+# POSSIBILITY OF SUCH DAMAGE.
+
+from generic import *
+from pdf import PdfFileReader, PdfFileWriter, Destination
+
+class _MergedPage(object):
+    """
+    _MergedPage is used internally by PdfFileMerger to collect necessary information on each page that is being merged.
+    """
+    def __init__(self, pagedata, src, id):
+        self.src = src
+        self.pagedata = pagedata
+        self.out_pagedata = None
+        self.id = id
+        
+class PdfFileMerger(object):
+    """
+    PdfFileMerger merges multiple PDFs into a single PDF. It can concatenate, 
+    slice, insert, or any combination of the above.
+    
+    See the functions "merge" (or "append") and "write" (or "overwrite") for
+    usage information.
+    """
+    
+    def __init__(self):
+        """
+        >>> PdfFileMerger()
+        
+        Initializes a PdfFileMerger, no parameters required
+        """
+        self.inputs = []
+        self.pages = []
+        self.output = PdfFileWriter()
+        self.bookmarks = []
+        self.named_dests = []
+        self.id_count = 0
+        
+    def merge(self, position, fileobj, bookmark=None, pages=None, import_bookmarks=True):
+        """
+        >>> merge(position, file, bookmark=None, pages=None, import_bookmarks=True)
+        
+        Merges the pages from the source document specified by "file" into the output
+        file at the page number specified by "position".
+        
+        Optionally, you may specify a bookmark to be applied at the beginning of the 
+        included file by supplying the text of the bookmark in the "bookmark" parameter.
+        
+        You may prevent the source document's bookmarks from being imported by
+        specifying "import_bookmarks" as False.
+        
+        You may also use the "pages" parameter to merge only the specified range of 
+        pages from the source document into the output document.
+        """
+        
+        my_file = False
+        if type(fileobj) in (str, unicode):
+            fileobj = file(fileobj, 'rb')
+            my_file = True
+            
+        if type(fileobj) == PdfFileReader:
+            pdfr = fileobj
+            fileobj = pdfr.file
+        else:
+            pdfr = PdfFileReader(fileobj)
+        
+        # Find the range of pages to merge
+        if pages == None:
+            pages = (0, pdfr.getNumPages())
+        elif type(pages) in (int, float, str, unicode):
+            raise TypeError('"pages" must be a tuple of (start, end)')
+        
+        srcpages = []
+        
+        if bookmark:
+            bookmark = Bookmark(TextStringObject(bookmark), NumberObject(self.id_count), NameObject('/Fit'))
+        
+        outline = []
+        if import_bookmarks:
+            outline = pdfr.getOutlines()
+            outline = self._trim_outline(pdfr, outline, pages)
+        
+        if bookmark:
+            self.bookmarks += [bookmark, outline]
+        else:
+            self.bookmarks += outline
+        
+        dests = pdfr.namedDestinations
+        dests = self._trim_dests(pdfr, dests, pages)
+        self.named_dests += dests
+        
+        # Gather all the pages that are going to be merged
+        for i in range(*pages):
+            pg = pdfr.getPage(i)
+            
+            id = self.id_count
+            self.id_count += 1
+            
+            mp = _MergedPage(pg, pdfr, id)
+            
+            srcpages.append(mp)
+
+        self._associate_dests_to_pages(srcpages)
+        self._associate_bookmarks_to_pages(srcpages)
+            
+        
+        # Slice to insert the pages at the specified position
+        self.pages[position:position] = srcpages
+        
+        # Keep track of our input files so we can close them later
+        self.inputs.append((fileobj, pdfr, my_file))
+        
+        
+    def append(self, fileobj, bookmark=None, pages=None, import_bookmarks=True):
+        """
+        >>> append(file, bookmark=None, pages=None, import_bookmarks=True):
+        
+        Identical to the "merge" function, but assumes you want to concatenate all pages
+        onto the end of the file instead of specifying a position.
+        """
+        
+        self.merge(len(self.pages), fileobj, bookmark, pages, import_bookmarks)
+        
+    
+    def write(self, fileobj):
+        """
+        >>> write(file)
+        
+        Writes all data that has been merged to "file" (which can be a filename or any
+        kind of file-like object)
+        """
+        my_file = False
+        if type(fileobj) in (str, unicode):
+            fileobj = file(fileobj, 'wb')
+            my_file = True
+
+
+        # Add pages to the PdfFileWriter
+        for page in self.pages:
+            self.output.addPage(page.pagedata)
+            page.out_pagedata = self.output.getReference(self.output._pages.getObject()["/Kids"][-1].getObject())
+
+
+        # Once all pages are added, create bookmarks to point at those pages
+        self._write_dests()
+        self._write_bookmarks()
+        
+        # Write the output to the file   
+        self.output.write(fileobj)
+        
+        if my_file:
+            fileobj.close()
+
+
+        
+    def close(self):
+        """
+        >>> close()
+        
+        Shuts all file descriptors (input and output) and clears all memory usage
+        """
+        self.pages = []
+        for fo, pdfr, mine in self.inputs:
+            if mine:
+                fo.close()
+        
+        self.inputs = []
+        self.output = None
+    
+    def _trim_dests(self, pdf, dests, pages):
+        """
+        Removes any named destinations that are not a part of the specified page set
+        """
+        new_dests = []
+        prev_header_added = True
+        for k, o in dests.items():
+            for j in range(*pages):
+                if pdf.getPage(j).getObject() == o['/Page'].getObject():
+                    o[NameObject('/Page')] = o['/Page'].getObject()
+                    assert str(k) == str(o['/Title'])
+                    new_dests.append(o)
+                    break
+        return new_dests
+    
+    def _trim_outline(self, pdf, outline, pages):
+        """
+        Removes any outline/bookmark entries that are not a part of the specified page set
+        """
+        new_outline = []
+        prev_header_added = True
+        for i, o in enumerate(outline):
+            if type(o) == list:
+                sub = self._trim_outline(pdf, o, pages)
+                if sub:
+                    if not prev_header_added:
+                        new_outline.append(outline[i-1])
+                    new_outline.append(sub)
+            else:
+                prev_header_added = False
+                for j in range(*pages):
+                    if pdf.getPage(j).getObject() == o['/Page'].getObject():
+                        o[NameObject('/Page')] = o['/Page'].getObject()
+                        new_outline.append(o)
+                        prev_header_added = True
+                        break
+        return new_outline
+   
+    def _write_dests(self):
+        dests = self.named_dests
+        
+        for v in dests:
+            pageno = None
+            pdf = None
+            if v.has_key('/Page'):
+                for i, p in enumerate(self.pages):
+                    if p.id == v['/Page']:
+                        v[NameObject('/Page')] = p.out_pagedata
+                        pageno = i
+                        pdf = p.src
+            if pageno != None:
+                self.output.addNamedDestinationObject(v)
+ 
+    def _write_bookmarks(self, bookmarks=None, parent=None):
+        
+        if bookmarks == None:
+            bookmarks = self.bookmarks
+        
+
+        last_added = None
+        for b in bookmarks:
+            if type(b) == list:
+                self._write_bookmarks(b, last_added)
+                continue
+                
+            pageno = None
+            pdf = None
+            if b.has_key('/Page'):
+                for i, p in enumerate(self.pages):
+                    if p.id == b['/Page']:
+                        b[NameObject('/Page')] = p.out_pagedata
+                        pageno = i
+                        pdf = p.src
+            if pageno != None:
+                last_added = self.output.addBookmarkDestination(b, parent)
+    
+
+    def _associate_dests_to_pages(self, pages):
+        for nd in self.named_dests:
+            pageno = None
+            np = nd['/Page']
+            
+            if type(np) == NumberObject:
+                continue
+            
+            for p in pages:
+                if np.getObject() == p.pagedata.getObject():
+                    pageno = p.id
+            
+            if pageno != None:
+                nd[NameObject('/Page')] = NumberObject(pageno)
+            else:
+                raise ValueError, "Unresolved named destination '%s'" % (nd['/Title'],)
+    
+    def _associate_bookmarks_to_pages(self, pages, bookmarks=None):
+        if bookmarks == None:
+            bookmarks = self.bookmarks
+
+        for b in bookmarks:
+            if type(b) == list:
+                self._associate_bookmarks_to_pages(pages, b)
+                continue
+                
+            pageno = None
+            bp = b['/Page']
+            
+            if type(bp) == NumberObject:
+                continue
+                
+            for p in pages:
+                if bp.getObject() == p.pagedata.getObject():
+                    pageno = p.id
+            
+            if pageno != None:
+                b[NameObject('/Page')] = NumberObject(pageno)
+            else:
+                raise ValueError, "Unresolved bookmark '%s'" % (b['/Title'],)
+                
+    def findBookmark(self, bookmark, root=None):
+    	if root == None:
+    		root = self.bookmarks
+    	
+    	for i, b in enumerate(root):
+    		if type(b) == list:
+    			res = self.findBookmark(bookmark, b)
+    			if res:
+    				return [i] + res
+    		if b == bookmark or b['/Title'] == bookmark:
+    			return [i]
+    
+    	return None
+
+    def addBookmark(self, title, pagenum, parent=None):
+        """
+        Add a bookmark to the pdf, using the specified title and pointing at 
+        the specified page number. A parent can be specified to make this a
+        nested bookmark below the parent.
+        """
+
+        if parent == None:
+        	iloc = [len(self.bookmarks)-1]
+        elif type(parent) == list:
+        	iloc = parent
+        else:
+        	iloc = self.findBookmark(parent)
+        
+        dest = Bookmark(TextStringObject(title), NumberObject(pagenum), NameObject('/FitH'), NumberObject(826))
+        
+        if parent == None:
+        	self.bookmarks.append(dest)
+        else:
+        	bmparent = self.bookmarks
+        	for i in iloc[:-1]:
+        		bmparent = bmparent[i]
+        	npos = iloc[-1]+1
+        	if npos < len(bmparent) and type(bmparent[npos]) == list:
+        		bmparent[npos].append(dest)
+        	else:
+        		bmparent.insert(npos, [dest])
+        		
+        
+    def addNamedDestination(self, title, pagenum):
+        """
+        Add a destination to the pdf, using the specified title and pointing
+        at the specified page number.
+        """
+        
+        dest = Destination(TextStringObject(title), NumberObject(pagenum), NameObject('/FitH'), NumberObject(826))
+        self.named_dests.append(dest)
+
+
+class OutlinesObject(list):
+    def __init__(self, pdf, tree, parent=None):
+        list.__init__(self)
+        self.tree = tree
+        self.pdf = pdf
+        self.parent = parent
+    
+    def remove(self, index):
+        obj = self[index]
+        del self[index]
+        self.tree.removeChild(obj)
+        
+    def add(self, title, page):
+        pageRef = self.pdf.getObject(self.pdf._pages)['/Kids'][pagenum]
+        action = DictionaryObject()
+        action.update({
+            NameObject('/D') : ArrayObject([pageRef, NameObject('/FitH'), NumberObject(826)]),
+            NameObject('/S') : NameObject('/GoTo')
+        })
+        actionRef = self.pdf._addObject(action)
+        bookmark = TreeObject()
+
+        bookmark.update({
+            NameObject('/A') : actionRef,
+            NameObject('/Title') : createStringObject(title),
+        })
+
+        pdf._addObject(bookmark)
+
+        self.tree.addChild(bookmark)
+        
+    def removeAll(self):
+        for child in [x for x in self.tree.children()]:
+            self.tree.removeChild(child)
+            self.pop()
--- a/PyPDF2/pdf.py
+++ b/PyPDF2/pdf.py
--- a/PyPDF2/utils.py
+++ b/PyPDF2/utils.py
@ -0,0 +1,125 @@
+# vim: sw=4:expandtab:foldmethod=marker
+#
+# Copyright (c) 2006, Mathieu Fenniak
+# All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are
+# met:
+#
+# * Redistributions of source code must retain the above copyright notice,
+# this list of conditions and the following disclaimer.
+# * Redistributions in binary form must reproduce the above copyright notice,
+# this list of conditions and the following disclaimer in the documentation
+# and/or other materials provided with the distribution.
+# * The name of the author may not be used to endorse or promote products
+# derived from this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+# ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
+# LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+# CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+# SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+# INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+# CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+# ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+# POSSIBILITY OF SUCH DAMAGE.
+
+
+"""
+Utility functions for PDF library.
+"""
+__author__ = "Mathieu Fenniak"
+__author_email__ = "biziqe@mathieu.fenniak.net"
+
+#ENABLE_PSYCO = False
+#if ENABLE_PSYCO:
+#    try:
+#        import psyco
+#    except ImportError:
+#        ENABLE_PSYCO = False
+#
+#if not ENABLE_PSYCO:
+#    class psyco:
+#        def proxy(func):
+#            return func
+#        proxy = staticmethod(proxy)
+
+def readUntilWhitespace(stream, maxchars=None):
+    txt = ""
+    while True:
+        tok = stream.read(1)
+        if tok.isspace() or not tok:
+            break
+        txt += tok
+        if len(txt) == maxchars:
+            break
+    return txt
+
+def readNonWhitespace(stream):
+    tok = ' '
+    while tok == '\n' or tok == '\r' or tok == ' ' or tok == '\t':
+        tok = stream.read(1)
+    return tok
+
+class ConvertFunctionsToVirtualList(object):
+    def __init__(self, lengthFunction, getFunction):
+        self.lengthFunction = lengthFunction
+        self.getFunction = getFunction
+
+    def __len__(self):
+        return self.lengthFunction()
+
+    def __getitem__(self, index):
+        if not isinstance(index, int):
+            raise TypeError, "sequence indices must be integers"
+        len_self = len(self)
+        if index < 0:
+            # support negative indexes
+            index = len_self + index
+        if index < 0 or index >= len_self:
+            raise IndexError, "sequence index out of range"
+        return self.getFunction(index)
+
+def RC4_encrypt(key, plaintext):
+    S = [i for i in range(256)]
+    j = 0
+    for i in range(256):
+        j = (j + S[i] + ord(key[i % len(key)])) % 256
+        S[i], S[j] = S[j], S[i]
+    i, j = 0, 0
+    retval = ""
+    for x in range(len(plaintext)):
+        i = (i + 1) % 256
+        j = (j + S[i]) % 256
+        S[i], S[j] = S[j], S[i]
+        t = S[(S[i] + S[j]) % 256]
+        retval += chr(ord(plaintext[x]) ^ t)
+    return retval
+
+def matrixMultiply(a, b):
+    return [[sum([float(i)*float(j)
+                  for i, j in zip(row, col)]
+                ) for col in zip(*b)]
+            for row in a]
+
+class PyPdfError(Exception):
+    pass
+
+class PdfReadError(PyPdfError):
+    pass
+
+class PageSizeNotDefinedError(PyPdfError):
+    pass
+    
+class PdfReadWarning(UserWarning):
+    pass
+
+if __name__ == "__main__":
+    # test RC4
+    out = RC4_encrypt("Key", "Plaintext")
+    print repr(out)
+    pt = RC4_encrypt("Key", out)
+    print repr(pt)
--- a/PyPDF2/xmp.py
+++ b/PyPDF2/xmp.py
@ -0,0 +1,355 @@
+import re
+import datetime
+import decimal
+from generic import PdfObject
+from xml.dom import getDOMImplementation
+from xml.dom.minidom import parseString
+
+RDF_NAMESPACE = "http://www.w3.org/1999/02/22-rdf-syntax-ns#"
+DC_NAMESPACE = "http://purl.org/dc/elements/1.1/"
+XMP_NAMESPACE = "http://ns.adobe.com/xap/1.0/"
+PDF_NAMESPACE = "http://ns.adobe.com/pdf/1.3/"
+XMPMM_NAMESPACE = "http://ns.adobe.com/xap/1.0/mm/"
+
+# What is the PDFX namespace, you might ask?  I might ask that too.  It's
+# a completely undocumented namespace used to place "custom metadata"
+# properties, which are arbitrary metadata properties with no semantic or
+# documented meaning.  Elements in the namespace are key/value-style storage,
+# where the element name is the key and the content is the value.  The keys
+# are transformed into valid XML identifiers by substituting an invalid
+# identifier character with \u2182 followed by the unicode hex ID of the
+# original character.  A key like "my car" is therefore "my\u21820020car".
+#
+# \u2182, in case you're wondering, is the unicode character
+# \u{ROMAN NUMERAL TEN THOUSAND}, a straightforward and obvious choice for
+# escaping characters.
+#
+# Intentional users of the pdfx namespace should be shot on sight.  A
+# custom data schema and sensical XML elements could be used instead, as is
+# suggested by Adobe's own documentation on XMP (under "Extensibility of
+# Schemas").
+#
+# Information presented here on the /pdfx/ schema is a result of limited
+# reverse engineering, and does not constitute a full specification.
+PDFX_NAMESPACE = "http://ns.adobe.com/pdfx/1.3/"
+
+iso8601 = re.compile("""
+        (?P<year>[0-9]{4})
+        (-
+            (?P<month>[0-9]{2})
+            (-
+                (?P<day>[0-9]+)
+                (T
+                    (?P<hour>[0-9]{2}):
+                    (?P<minute>[0-9]{2})
+                    (:(?P<second>[0-9]{2}(.[0-9]+)?))?
+                    (?P<tzd>Z|[-+][0-9]{2}:[0-9]{2})
+                )?
+            )?
+        )?
+        """, re.VERBOSE)
+
+##
+# An object that represents Adobe XMP metadata.
+class XmpInformation(PdfObject):
+
+    def __init__(self, stream):
+        self.stream = stream
+        docRoot = parseString(self.stream.getData())
+        self.rdfRoot = docRoot.getElementsByTagNameNS(RDF_NAMESPACE, "RDF")[0]
+        self.cache = {}
+
+    def writeToStream(self, stream, encryption_key):
+        self.stream.writeToStream(stream, encryption_key)
+
+    def getElement(self, aboutUri, namespace, name):
+        for desc in self.rdfRoot.getElementsByTagNameNS(RDF_NAMESPACE, "Description"):
+            if desc.getAttributeNS(RDF_NAMESPACE, "about") == aboutUri:
+                attr = desc.getAttributeNodeNS(namespace, name)
+                if attr != None:
+                    yield attr
+                for element in desc.getElementsByTagNameNS(namespace, name):
+                    yield element
+
+    def getNodesInNamespace(self, aboutUri, namespace):
+        for desc in self.rdfRoot.getElementsByTagNameNS(RDF_NAMESPACE, "Description"):
+            if desc.getAttributeNS(RDF_NAMESPACE, "about") == aboutUri:
+                for i in range(desc.attributes.length):
+                    attr = desc.attributes.item(i)
+                    if attr.namespaceURI == namespace:
+                        yield attr
+                for child in desc.childNodes:
+                    if child.namespaceURI == namespace:
+                        yield child
+
+    def _getText(self, element):
+        text = ""
+        for child in element.childNodes:
+            if child.nodeType == child.TEXT_NODE:
+                text += child.data
+        return text
+
+    def _converter_string(value):
+        return value
+
+    def _converter_date(value):
+        m = iso8601.match(value)
+        year = int(m.group("year"))
+        month = int(m.group("month") or "1")
+        day = int(m.group("day") or "1")
+        hour = int(m.group("hour") or "0")
+        minute = int(m.group("minute") or "0")
+        second = decimal.Decimal(m.group("second") or "0")
+        seconds = second.to_integral(decimal.ROUND_FLOOR)
+        milliseconds = (second - seconds) * 1000000
+        tzd = m.group("tzd") or "Z"
+        dt = datetime.datetime(year, month, day, hour, minute, seconds, milliseconds)
+        if tzd != "Z":
+            tzd_hours, tzd_minutes = [int(x) for x in tzd.split(":")]
+            tzd_hours *= -1
+            if tzd_hours < 0:
+                tzd_minutes *= -1
+            dt = dt + datetime.timedelta(hours=tzd_hours, minutes=tzd_minutes)
+        return dt
+    _test_converter_date = staticmethod(_converter_date)
+
+    def _getter_bag(namespace, name, converter):
+        def get(self):
+            cached = self.cache.get(namespace, {}).get(name)
+            if cached:
+                return cached
+            retval = []
+            for element in self.getElement("", namespace, name):
+                bags = element.getElementsByTagNameNS(RDF_NAMESPACE, "Bag")
+                if len(bags):
+                    for bag in bags:
+                        for item in bag.getElementsByTagNameNS(RDF_NAMESPACE, "li"):
+                            value = self._getText(item)
+                            value = converter(value)
+                            retval.append(value)
+            ns_cache = self.cache.setdefault(namespace, {})
+            ns_cache[name] = retval
+            return retval
+        return get
+
+    def _getter_seq(namespace, name, converter):
+        def get(self):
+            cached = self.cache.get(namespace, {}).get(name)
+            if cached:
+                return cached
+            retval = []
+            for element in self.getElement("", namespace, name):
+                seqs = element.getElementsByTagNameNS(RDF_NAMESPACE, "Seq")
+                if len(seqs):
+                    for seq in seqs:
+                        for item in seq.getElementsByTagNameNS(RDF_NAMESPACE, "li"):
+                            value = self._getText(item)
+                            value = converter(value)
+                            retval.append(value)
+                else:
+                    value = converter(self._getText(element))
+                    retval.append(value)
+            ns_cache = self.cache.setdefault(namespace, {})
+            ns_cache[name] = retval
+            return retval
+        return get
+
+    def _getter_langalt(namespace, name, converter):
+        def get(self):
+            cached = self.cache.get(namespace, {}).get(name)
+            if cached:
+                return cached
+            retval = {}
+            for element in self.getElement("", namespace, name):
+                alts = element.getElementsByTagNameNS(RDF_NAMESPACE, "Alt")
+                if len(alts):
+                    for alt in alts:
+                        for item in alt.getElementsByTagNameNS(RDF_NAMESPACE, "li"):
+                            value = self._getText(item)
+                            value = converter(value)
+                            retval[item.getAttribute("xml:lang")] = value
+                else:
+                    retval["x-default"] = converter(self._getText(element))
+            ns_cache = self.cache.setdefault(namespace, {})
+            ns_cache[name] = retval
+            return retval
+        return get
+
+    def _getter_single(namespace, name, converter):
+        def get(self):
+            cached = self.cache.get(namespace, {}).get(name)
+            if cached:
+                return cached
+            value = None
+            for element in self.getElement("", namespace, name):
+                if element.nodeType == element.ATTRIBUTE_NODE:
+                    value = element.nodeValue
+                else:
+                    value = self._getText(element)
+                break
+            if value != None:
+                value = converter(value)
+            ns_cache = self.cache.setdefault(namespace, {})
+            ns_cache[name] = value
+            return value
+        return get
+
+    ##
+    # Contributors to the resource (other than the authors).  An unsorted
+    # array of names.
+    # <p>Stability: Added in v1.12, will exist for all future v1.x releases.
+    dc_contributor = property(_getter_bag(DC_NAMESPACE, "contributor", _converter_string))
+
+    ##
+    # Text describing the extent or scope of the resource.
+    # <p>Stability: Added in v1.12, will exist for all future v1.x releases.
+    dc_coverage = property(_getter_single(DC_NAMESPACE, "coverage", _converter_string))
+
+    ##
+    # A sorted array of names of the authors of the resource, listed in order
+    # of precedence.
+    # <p>Stability: Added in v1.12, will exist for all future v1.x releases.
+    dc_creator = property(_getter_seq(DC_NAMESPACE, "creator", _converter_string))
+
+    ##
+    # A sorted array of dates (datetime.datetime instances) of signifigance to
+    # the resource.  The dates and times are in UTC.
+    # <p>Stability: Added in v1.12, will exist for all future v1.x releases.
+    dc_date = property(_getter_seq(DC_NAMESPACE, "date", _converter_date))
+
+    ##
+    # A language-keyed dictionary of textual descriptions of the content of the
+    # resource.
+    # <p>Stability: Added in v1.12, will exist for all future v1.x releases.
+    dc_description = property(_getter_langalt(DC_NAMESPACE, "description", _converter_string))
+
+    ##
+    # The mime-type of the resource.
+    # <p>Stability: Added in v1.12, will exist for all future v1.x releases.
+    dc_format = property(_getter_single(DC_NAMESPACE, "format", _converter_string))
+
+    ##
+    # Unique identifier of the resource.
+    # <p>Stability: Added in v1.12, will exist for all future v1.x releases.
+    dc_identifier = property(_getter_single(DC_NAMESPACE, "identifier", _converter_string))
+
+    ##
+    # An unordered array specifying the languages used in the resource.
+    # <p>Stability: Added in v1.12, will exist for all future v1.x releases.
+    dc_language = property(_getter_bag(DC_NAMESPACE, "language", _converter_string))
+
+    ##
+    # An unordered array of publisher names.
+    # <p>Stability: Added in v1.12, will exist for all future v1.x releases.
+    dc_publisher = property(_getter_bag(DC_NAMESPACE, "publisher", _converter_string))
+
+    ##
+    # An unordered array of text descriptions of relationships to other
+    # documents.
+    # <p>Stability: Added in v1.12, will exist for all future v1.x releases.
+    dc_relation = property(_getter_bag(DC_NAMESPACE, "relation", _converter_string))
+
+    ##
+    # A language-keyed dictionary of textual descriptions of the rights the
+    # user has to this resource.
+    # <p>Stability: Added in v1.12, will exist for all future v1.x releases.
+    dc_rights = property(_getter_langalt(DC_NAMESPACE, "rights", _converter_string))
+
+    ##
+    # Unique identifier of the work from which this resource was derived.
+    # <p>Stability: Added in v1.12, will exist for all future v1.x releases.
+    dc_source = property(_getter_single(DC_NAMESPACE, "source", _converter_string))
+
+    ##
+    # An unordered array of descriptive phrases or keywrods that specify the
+    # topic of the content of the resource.
+    # <p>Stability: Added in v1.12, will exist for all future v1.x releases.
+    dc_subject = property(_getter_bag(DC_NAMESPACE, "subject", _converter_string))
+
+    ##
+    # A language-keyed dictionary of the title of the resource.
+    # <p>Stability: Added in v1.12, will exist for all future v1.x releases.
+    dc_title = property(_getter_langalt(DC_NAMESPACE, "title", _converter_string))
+
+    ##
+    # An unordered array of textual descriptions of the document type.
+    # <p>Stability: Added in v1.12, will exist for all future v1.x releases.
+    dc_type = property(_getter_bag(DC_NAMESPACE, "type", _converter_string))
+
+    ##
+    # An unformatted text string representing document keywords.
+    # <p>Stability: Added in v1.12, will exist for all future v1.x releases.
+    pdf_keywords = property(_getter_single(PDF_NAMESPACE, "Keywords", _converter_string))
+
+    ##
+    # The PDF file version, for example 1.0, 1.3.
+    # <p>Stability: Added in v1.12, will exist for all future v1.x releases.
+    pdf_pdfversion = property(_getter_single(PDF_NAMESPACE, "PDFVersion", _converter_string))
+
+    ##
+    # The name of the tool that created the PDF document.
+    # <p>Stability: Added in v1.12, will exist for all future v1.x releases.
+    pdf_producer = property(_getter_single(PDF_NAMESPACE, "Producer", _converter_string))
+
+    ##
+    # The date and time the resource was originally created.  The date and
+    # time are returned as a UTC datetime.datetime object.
+    # <p>Stability: Added in v1.12, will exist for all future v1.x releases.
+    xmp_createDate = property(_getter_single(XMP_NAMESPACE, "CreateDate", _converter_date))
+    
+    ##
+    # The date and time the resource was last modified.  The date and time
+    # are returned as a UTC datetime.datetime object.
+    # <p>Stability: Added in v1.12, will exist for all future v1.x releases.
+    xmp_modifyDate = property(_getter_single(XMP_NAMESPACE, "ModifyDate", _converter_date))
+
+    ##
+    # The date and time that any metadata for this resource was last
+    # changed.  The date and time are returned as a UTC datetime.datetime
+    # object.
+    # <p>Stability: Added in v1.12, will exist for all future v1.x releases.
+    xmp_metadataDate = property(_getter_single(XMP_NAMESPACE, "MetadataDate", _converter_date))
+
+    ##
+    # The name of the first known tool used to create the resource.
+    # <p>Stability: Added in v1.12, will exist for all future v1.x releases.
+    xmp_creatorTool = property(_getter_single(XMP_NAMESPACE, "CreatorTool", _converter_string))
+
+    ##
+    # The common identifier for all versions and renditions of this resource.
+    # <p>Stability: Added in v1.12, will exist for all future v1.x releases.
+    xmpmm_documentId = property(_getter_single(XMPMM_NAMESPACE, "DocumentID", _converter_string))
+
+    ##
+    # An identifier for a specific incarnation of a document, updated each
+    # time a file is saved.
+    # <p>Stability: Added in v1.12, will exist for all future v1.x releases.
+    xmpmm_instanceId = property(_getter_single(XMPMM_NAMESPACE, "InstanceID", _converter_string))
+
+    def custom_properties(self):
+        if not hasattr(self, "_custom_properties"):
+            self._custom_properties = {}
+            for node in self.getNodesInNamespace("", PDFX_NAMESPACE):
+                key = node.localName
+                while True:
+                    # see documentation about PDFX_NAMESPACE earlier in file
+                    idx = key.find(u"\u2182")
+                    if idx == -1:
+                        break
+                    key = key[:idx] + chr(int(key[idx+1:idx+5], base=16)) + key[idx+5:]
+                if node.nodeType == node.ATTRIBUTE_NODE:
+                    value = node.nodeValue
+                else:
+                    value = self._getText(node)
+                self._custom_properties[key] = value
+        return self._custom_properties
+
+    ##
+    # Retrieves custom metadata properties defined in the undocumented pdfx
+    # metadata schema.
+    # <p>Stability: Added in v1.12, will exist for all future v1.x releases.
+    # @return Returns a dictionary of key/value items for custom metadata
+    # properties.
+    custom_properties = property(custom_properties)
+
+
--- a/38
+++ b/38
@ -0,0 +1,38 @@
+Example:
+
+    from pyPdf import PdfFileWriter, PdfFileReader
+
+    output = PdfFileWriter()
+    input1 = PdfFileReader(file("document1.pdf", "rb"))
+
+    # add page 1 from input1 to output document, unchanged
+    output.addPage(input1.getPage(0))
+
+    # add page 2 from input1, but rotated clockwise 90 degrees
+    output.addPage(input1.getPage(1).rotateClockwise(90))
+
+    # add page 3 from input1, rotated the other way:
+    output.addPage(input1.getPage(2).rotateCounterClockwise(90))
+    # alt: output.addPage(input1.getPage(2).rotateClockwise(270))
+
+    # add page 4 from input1, but first add a watermark from another pdf:
+    page4 = input1.getPage(3)
+    watermark = PdfFileReader(file("watermark.pdf", "rb"))
+    page4.mergePage(watermark.getPage(0))
+
+    # add page 5 from input1, but crop it to half size:
+    page5 = input1.getPage(4)
+    page5.mediaBox.upperRight = (
+        page5.mediaBox.getUpperRight_x() / 2,
+        page5.mediaBox.getUpperRight_y() / 2
+    )
+    output.addPage(page5)
+
+    # print how many pages input1 has:
+    print "document1.pdf has %s pages." % input1.getNumPages())
+
+    # finally, write "output" to document-output.pdf
+    outputStream = file("document-output.pdf", "wb")
+    output.write(outputStream)
+
+
--- a/setup.py
+++ b/setup.py
@ -0,0 +1,40 @@
+#!/usr/bin/env python
+
+from distutils.core import setup
+
+long_description = """
+A Pure-Python library built as a PDF toolkit.  It is capable of:
+    
+- extracting document information (title, author, ...),
+- splitting documents page by page,
+- merging documents page by page,
+- cropping pages,
+- merging multiple pages into a single page,
+- encrypting and decrypting PDF files.
+
+By being Pure-Python, it should run on any Python platform without any
+dependencies on external libraries.  It can also work entirely on StringIO
+objects rather than file streams, allowing for PDF manipulation in memory.
+It is therefore a useful tool for websites that manage or manipulate PDFs.
+"""
+
+setup(
+        name="pyPdf",
+        version="1.12",
+        description="PDF toolkit",
+        long_description=long_description,
+        author="Mathieu Fenniak",
+        author_email="biziqe@mathieu.fenniak.net",
+        url="http://pybrary.net/pyPdf/",
+        download_url="http://pybrary.net/pyPdf/pyPdf-1.12.tar.gz",
+        classifiers = [
+            "Development Status :: 5 - Production/Stable",
+            "Intended Audience :: Developers",
+            "License :: OSI Approved :: BSD License",
+            "Programming Language :: Python",
+            "Operating System :: OS Independent",
+            "Topic :: Software Development :: Libraries :: Python Modules",
+            ],
+        packages=["pyPdf"],
+    )
+