pdfrw (0.1-3) unstable; urgency=medium

* QA upload. * Build using dh_python2 # imported from the archive
2014-07-13 17:50:59 +02:00 · 2014-07-13 17:50:59 +02:00 · a1959ba9c0
commit a1959ba9c0
49 changed files with 3407 additions and 0 deletions
--- a/LICENSE.txt
+++ b/LICENSE.txt
@ -0,0 +1,21 @@
+pdfrw (pdfrw.googlecode.com)
+
+Copyright (c) 2006-2012 Patrick Maupin
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in
+all copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+THE SOFTWARE.
--- a/README.txt
+++ b/README.txt
@ -0,0 +1,3 @@
+pdfrw reads and writes PDF files.
+
+More info at http://code.google.com/p/pdfrw
--- a/debian/changelog
+++ b/debian/changelog
@ -0,0 +1,45 @@
+pdfrw (0.1-3) unstable; urgency=medium
+
+  * QA upload.
+  * Build using dh_python2
+
+ -- Matthias Klose <doko@debian.org>  Sun, 13 Jul 2014 15:50:59 +0000
+
+pdfrw (0.1-2) unstable; urgency=medium
+
+  * Orphaning package.
+
+ -- Chris Lamb <lamby@debian.org>  Sun, 09 Feb 2014 00:05:27 +0000
+
+pdfrw (0.1-1) unstable; urgency=low
+
+  * New upstream release.
+
+ -- Chris Lamb <lamby@debian.org>  Tue, 16 Oct 2012 07:54:53 +0100
+
+pdfrw (0+svn136-4) unstable; urgency=low
+
+  * Correct Homepage field. (Closes: #683165)
+  * Specify a 'name' kwarg in call to setuptools.setup.
+
+ -- Chris Lamb <lamby@debian.org>  Tue, 31 Jul 2012 02:41:14 -0700
+
+pdfrw (0+svn136-3) unstable; urgency=low
+
+  * python-pdfrw should Replaces/Provides/Conflicts pdfrw. Thanks to intrigeri
+    <intrigeri@boum.org>. (Closes: #639273)
+
+ -- Chris Lamb <lamby@debian.org>  Fri, 26 Aug 2011 10:48:38 +0100
+
+pdfrw (0+svn136-2) unstable; urgency=low
+
+  * Rename binary package to "python-pdfrw".
+  * Change Section to "python".
+
+ -- Chris Lamb <lamby@debian.org>  Tue, 23 Aug 2011 15:17:20 +0100
+
+pdfrw (0+svn136-1) unstable; urgency=low
+
+  * Initial release. (Closes: #638862)
+
+ -- Chris Lamb <lamby@debian.org>  Mon, 22 Aug 2011 16:09:03 +0100
--- a/debian/compat
+++ b/debian/compat
@ -0,0 +1 @@
+7
--- a/debian/control
+++ b/debian/control
@ -0,0 +1,32 @@
+Source: pdfrw
+Section: python
+Priority: optional
+Maintainer: Debian QA Group <packages@qa.debian.org>
+Build-Depends: debhelper (>= 7.0.50~)
+Build-Depends-Indep: python-setuptools
+Standards-Version: 3.9.2
+Homepage: http://code.google.com/p/pdfrw/
+Vcs-Git: git://github.com/lamby/pkg-pdfrw.git
+Vcs-Browser: https://github.com/lamby/pkg-pdfrw
+
+Package: python-pdfrw
+Architecture: all
+Depends: ${misc:Depends}, ${python:Depends}, python-reportlab
+Replaces: pdfrw
+Provides: pdfrw
+Conflicts: pdfrw
+Description: PDF file manipulation library
+ pdfrw can read and write PDF files, and can also be used to read in PDFs which
+ can then be used inside reportlab.
+ .
+ pdfrw tries to be agnostic about the contents of PDF files, and support them
+ as containers, but to do useful work, something a little higher-level is
+ required. It supports the following:
+ .
+  * PDF pages. pdfrw knows enough to find the pages in PDF files you read in,
+    and to write a set of pages back out to a new PDF file.
+  * Form XObjects. pdfrw can take any page or rectangle on a page, and convert
+    it to a Form XObject, suitable for use inside another PDF file
+  * reportlab objects. pdfrw can recursively create a set of reportlab objects
+    from its internal object format. This allows, for example, Form XObjects to
+    be used inside reportlab.
--- a/debian/copyright
+++ b/debian/copyright
@ -0,0 +1,44 @@
+Author: Patrick Maupin
+Download: http://code.google.com/p/pdfrw/
+
+Files: *
+Copyright: © 2006-2009 Patrick Maupin
+License: MIT
+ Permission is hereby granted, free of charge, to any person obtaining a copy
+ of this software and associated documentation files (the "Software"), to deal
+ in the Software without restriction, including without limitation the rights
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ copies of the Software, and to permit persons to whom the Software is
+ furnished to do so, subject to the following conditions:
+ .
+ The above copyright notice and this permission notice shall be included in
+ all copies or substantial portions of the Software.
+ .
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ THE SOFTWARE.
+
+Files: debian/*
+Copyright: © 2011 Chris Lamb <chris@chris-lamb.co.uk>
+License: MIT
+ Permission is hereby granted, free of charge, to any person obtaining a copy
+ of this software and associated documentation files (the "Software"), to deal
+ in the Software without restriction, including without limitation the rights
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ copies of the Software, and to permit persons to whom the Software is
+ furnished to do so, subject to the following conditions:
+ .
+ The above copyright notice and this permission notice shall be included in
+ all copies or substantial portions of the Software.
+ .
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ THE SOFTWARE.
--- a/debian/examples
+++ b/debian/examples
@ -0,0 +1 @@
+examples/*
--- a/debian/rules
+++ b/debian/rules
@ -0,0 +1,4 @@
+#!/usr/bin/make -f
+
+%:
+	dh $@ --with python2
--- a/debian/source/format
+++ b/debian/source/format
@ -0,0 +1 @@
+3.0 (quilt)
--- a/examples/4up.py
+++ b/examples/4up.py
@ -0,0 +1,51 @@
+#!/usr/bin/env python
+
+'''
+usage:   4up.py my.pdf firstpage lastpage
+
+Creates 4up.my.pdf
+
+'''
+
+import sys
+import os
+
+import find_pdfrw
+from pdfrw import PdfReader, PdfWriter, PdfDict, PdfName, PdfArray
+from pdfrw.buildxobj import pagexobj
+
+def get4(allpages):
+    # Pull a maximum of 4 pages off the list
+    pages = [pagexobj(x) for x in allpages[:4]]
+    del allpages[:4]
+
+    x_max = max(page.BBox[2] for page in pages)
+    y_max = max(page.BBox[3] for page in pages)
+
+    stream = []
+    xobjdict = PdfDict()
+    for index, page in enumerate(pages):
+        x = x_max * (index & 1) / 2.0
+        y = y_max * (index <= 1) / 2.0
+        index = '/P%s' % index
+        stream.append('q 0.5 0 0 0.5 %s %s cm %s Do Q\n' % (x, y, index))
+        xobjdict[index] = page
+
+    return PdfDict(
+        Type = PdfName.Page,
+        Contents = PdfDict(stream=''.join(stream)),
+        MediaBox = PdfArray([0, 0, x_max, y_max]),
+        Resources = PdfDict(XObject = xobjdict),
+    )
+
+def go(inpfn, outfn):
+    pages = PdfReader(inpfn).pages
+    writer = PdfWriter()
+    while pages:
+        writer.addpage(get4(pages))
+    writer.write(outfn)
+
+if __name__ == '__main__':
+    inpfn, = sys.argv[1:]
+    outfn = '4up.' + os.path.basename(inpfn)
+    go(inpfn, outfn)
--- a/examples/README.txt
+++ b/examples/README.txt
@ -0,0 +1,32 @@
+Example programs:
+
+4up.py -- Prints pages four-up
+
+alter.py -- Simple example of making a very slight modification to a PDF.
+
+booklet.py -- Converts a PDF into a booklet.
+
+metadata.py -- Concatenates multiple PDFs, adds metadata.
+
+poster.py -- Changes the size of a PDF to create a poster
+
+print_two.py  -- this is used when printing two cut-down copies on a single sheet of paper (double-sided)  Requires uncompressed PDF.
+
+rotate.py -- This will rotate selected ranges of pages within a document.
+
+subset.py -- This will retrieve a subset of pages from a document.
+
+watermark.py  -- Adds a watermark to a PDF
+
+rl1/4up.py -- Same as 4up.py, using reportlab for output.  Next simplest reportlab example.
+
+rl1/booklet.py -- Version of print_booklet using reportlab for output.
+
+rl1/platypus_pdf_template.py -- Example using a PDF page as a watermark background with reportlab.
+
+rl1/subset.py -- Same as subset.py, using reportlab for output.  Simplest reportlab example.
+
+rl2/copy.py -- example of how you could parse a graphics stream and then use reportlab for output.
+               Works on a few different PDFs, probably not a suitable starting point for real
+               production work without a lot of work on the library functions.
+
--- a/examples/alter.py
+++ b/examples/alter.py
@ -0,0 +1,25 @@
+#!/usr/bin/env python
+
+'''
+usage:   alter.py my.pdf
+
+Creates alter.my.pdf
+
+Demonstrates making a slight alteration to a preexisting PDF file.
+
+'''
+
+import sys
+import os
+
+import find_pdfrw
+from pdfrw import PdfReader, PdfWriter
+
+inpfn, = sys.argv[1:]
+outfn = 'alter.' + os.path.basename(inpfn)
+
+trailer = PdfReader(inpfn)
+trailer.Info.Title = 'My New Title Goes Here'
+writer = PdfWriter()
+writer.trailer = trailer
+writer.write(outfn)
--- a/examples/booklet.py
+++ b/examples/booklet.py
@ -0,0 +1,65 @@
+#!/usr/bin/env python
+
+'''
+usage:   booklet.py my.pdf
+
+Creates booklet.my.pdf
+
+Pages organized in a form suitable for booklet printing.
+
+'''
+
+import sys
+import os
+
+import find_pdfrw
+from pdfrw import PdfReader, PdfWriter, PdfDict, PdfArray, PdfName, IndirectPdfDict
+from pdfrw.buildxobj import pagexobj
+
+def fixpage(*pages):
+    pages = [pagexobj(x) for x in pages]
+
+    class PageStuff(tuple):
+        pass
+
+    x = y = 0
+    for i, page in enumerate(pages):
+        index = '/P%s' % i
+        shift_right = x and '1 0 0 1 %s 0 cm ' % x or ''
+        stuff = PageStuff((index, page))
+        stuff.stream = 'q %s%s Do Q\n' % (shift_right, index)
+        x += page.BBox[2]
+        y = max(y, page.BBox[3])
+        pages[i] = stuff
+
+    # Multiple copies of first page used as a placeholder to
+    # get blank page on back.
+    for p1, p2 in zip(pages, pages[1:]):
+        if p1[1] is p2[1]:
+            pages.remove(p1)
+
+    return IndirectPdfDict(
+        Type = PdfName.Page,
+        Contents = PdfDict(stream=''.join(page.stream for page in pages)),
+        MediaBox = PdfArray([0, 0, x, y]),
+        Resources = PdfDict(
+            XObject = PdfDict(pages),
+        ),
+    )
+
+inpfn, = sys.argv[1:]
+outfn = 'booklet.' + os.path.basename(inpfn)
+pages = PdfReader(inpfn).pages
+
+# Use page1 as a marker to print a blank at the end
+if len(pages) & 1:
+    pages.append(pages[0])
+
+bigpages = []
+while len(pages) > 2:
+    bigpages.append(fixpage(pages.pop(), pages.pop(0)))
+    bigpages.append(fixpage(pages.pop(0), pages.pop()))
+
+bigpages += pages
+
+PdfWriter().addpages(bigpages).write(outfn)
--- a/examples/find_pdfrw.py
+++ b/examples/find_pdfrw.py
@ -0,0 +1,33 @@
+'''
+    find_xxx.py -- Find the place in the tree where xxx lives.
+
+    Ways to use:
+                1) Make a copy, change 'xxx' in package to be your name; or
+                2) Under Linux, just ln -s to where this is in the right tree
+
+    Created by Pat Maupin, who doesn't consider it big enough to be worth copyrighting
+'''
+
+import sys
+import os
+
+myname = __name__[5:]   # remove 'find_'
+myname = os.path.join(myname, '__init__.py')
+
+def trypath(newpath):
+    path = None
+    while path != newpath:
+        path = newpath
+        if os.path.exists(os.path.join(path, myname)):
+            return path
+        newpath = os.path.dirname(path)
+
+root = trypath(__file__) or trypath(os.path.realpath(__file__))
+
+if root is None:
+    print
+    print 'Warning: %s: Could not find path to development package %s' % (__file__, myname)
+    print '             The import will either fail or will use system-installed libraries'
+    print
+elif root not in sys.path:
+    sys.path.append(root)
--- a/examples/metadata.py
+++ b/examples/metadata.py
@ -0,0 +1,39 @@
+#!/usr/bin/env python
+
+'''
+usage:   metadata.py <first.pdf> [<next.pdf> ...]
+
+Creates output.pdf
+
+This file demonstrates two features:
+
+1) Concatenating multiple input PDFs.
+
+2) adding metadata to the PDF.
+
+If you do not need to add metadata, look at subset.py, which
+has a simpler interface to PdfWriter.
+
+'''
+
+import sys
+import os
+
+import find_pdfrw
+from pdfrw import PdfReader, PdfWriter, IndirectPdfDict
+
+inputs = sys.argv[1:]
+assert inputs
+outfn = 'output.pdf'
+
+writer = PdfWriter()
+for inpfn in inputs:
+    writer.addpages(PdfReader(inpfn.pages)
+
+writer.trailer.Info = IndirectPdfDict(
+    Title = 'your title goes here',
+    Author = 'your name goes here',
+    Subject = 'what is it all about?',
+    Creator = 'some script goes here',
+)
+writer.write(outfn)
--- a/examples/poster.py
+++ b/examples/poster.py
@ -0,0 +1,57 @@
+#!/usr/bin/env python
+
+'''
+usage:   poster.py my.pdf
+
+Shows how to change the size on a PDF.
+
+Motivation:
+
+My daughter needed to create a 48" x 36" poster, but her Mac version of Powerpoint
+only wanted to output 8.5" x 11" for some reason.
+
+'''
+
+import sys
+import os
+
+import find_pdfrw
+from pdfrw import PdfReader, PdfWriter, PdfDict, PdfName, PdfArray, IndirectPdfDict
+from pdfrw.buildxobj import pagexobj
+
+def adjust(page):
+    page = pagexobj(page)
+    assert page.BBox == [0, 0, 11 * 72, int(8.5 * 72)], page.BBox
+    margin = 72 // 2
+    old_x, old_y = page.BBox[2] - 2 * margin, page.BBox[3] - 2 * margin
+
+    new_x, new_y = 48 * 72, 36 * 72
+    ratio = 1.0 * new_x / old_x
+    assert ratio == 1.0 * new_y / old_y
+
+    index = '/BasePage'
+    x = -margin * ratio
+    y = -margin * ratio
+    stream = 'q %0.2f 0 0 %0.2f %s %s cm %s Do Q\n' % (ratio, ratio, x, y, index)
+    xobjdict = PdfDict()
+    xobjdict[index] = page
+
+    return PdfDict(
+        Type = PdfName.Page,
+        Contents = PdfDict(stream=stream),
+        MediaBox = PdfArray([0, 0, new_x, new_y]),
+        Resources = PdfDict(XObject = xobjdict),
+    )
+
+def go(inpfn, outfn):
+    reader = PdfReader(inpfn)
+    page, = reader.pages
+    writer = PdfWriter()
+    writer.addpage(adjust(page))
+    writer.trailer.Info = IndirectPdfDict(reader.Info)
+    writer.write(outfn)
+
+if __name__ == '__main__':
+    inpfn, = sys.argv[1:]
+    outfn = 'poster.' + os.path.basename(inpfn)
+    go(inpfn, outfn)
--- a/examples/print_two.py
+++ b/examples/print_two.py
@ -0,0 +1,58 @@
+#!/usr/bin/env python
+
+'''
+usage:   print_two.py my.pdf
+
+Creates print_two.my.pdf
+
+This is only useful when you can cut down sheets of paper to make two
+small documents.  Works for double-sided only right now.
+
+'''
+
+import sys
+import os
+
+import find_pdfrw
+from pdfrw import PdfReader, PdfWriter, PdfArray, IndirectPdfDict
+
+def fixpage(page, count=[0]):
+    count[0] += 1
+    evenpage = not (count[0] & 1)
+
+    # For demo purposes, just go with the MediaBox and toast the others
+    box = [float(x) for x in page.MediaBox]
+    assert box[0] == box[1] == 0, "demo won't work on this PDF"
+
+    for key, value in sorted(page.iteritems()):
+        if 'box' in key.lower():
+            del page[key]
+
+    startsize = tuple(box[2:])
+    finalsize = box[3], 2 * box[2]
+    page.MediaBox = PdfArray((0, 0) + finalsize)
+    page.Rotate = (int(page.Rotate or 0) + 90) % 360
+
+    contents = page.Contents
+    if contents is None:
+        return page
+    contents = isinstance(contents, dict) and [contents] or contents
+
+    prefix = '0 1 -1 0 %s %s cm\n' % (finalsize[0], 0)
+    if evenpage:
+        prefix = '1 0 0 1 %s %s cm\n' % (0, finalsize[1]/2) +  prefix
+    first_prefix = 'q\n-1 0 0 -1 %s %s cm\n' % finalsize + prefix
+    second_prefix = '\nQ\n' + prefix
+    first_prefix = IndirectPdfDict(stream=first_prefix)
+    second_prefix = IndirectPdfDict(stream=second_prefix)
+    contents = PdfArray(([second_prefix] + contents) * 2)
+    contents[0] = first_prefix
+    page.Contents = contents
+    return page
+
+
+inpfn, = sys.argv[1:]
+outfn = 'print_two.' + os.path.basename(inpfn)
+pages = PdfReader(inpfn).pages
+
+PdfWriter().addpages(fixpage(x) for x in pages).write(outfn)
--- a/examples/rl1/4up.py
+++ b/examples/rl1/4up.py
@ -0,0 +1,57 @@
+#!/usr/bin/env python
+
+'''
+usage:   4up.py my.pdf
+
+
+Uses Form XObjects and reportlab to create 4up.my.pdf.
+
+Demonstrates use of pdfrw with reportlab.
+
+'''
+
+import sys
+import os
+
+from reportlab.pdfgen.canvas import Canvas
+
+import find_pdfrw
+from pdfrw import PdfReader
+from pdfrw.buildxobj import pagexobj
+from pdfrw.toreportlab import makerl
+
+
+def addpage(canvas, allpages):
+    pages = allpages[:4]
+    del allpages[:4]
+
+    x_max = max(page.BBox[2] for page in pages)
+    y_max = max(page.BBox[3] for page in pages)
+
+    canvas.setPageSize((x_max, y_max))
+
+    for index, page in enumerate(pages):
+        x = x_max * (index & 1) / 2.0
+        y = y_max * (index <= 1) / 2.0
+        canvas.saveState()
+        canvas.translate(x, y)
+        canvas.scale(0.5, 0.5)
+        canvas.doForm(makerl(canvas, page))
+        canvas.restoreState()
+    canvas.showPage()
+
+
+def go(argv):
+    inpfn, = argv
+    outfn = '4up.' + os.path.basename(inpfn)
+
+    pages = PdfReader(inpfn).pages
+    pages = [pagexobj(x) for x in pages]
+    canvas = Canvas(outfn)
+
+    while pages:
+        addpage(canvas, pages)
+    canvas.save()
+
+if __name__ == '__main__':
+    go(sys.argv[1:])
--- a/examples/rl1/README.txt
+++ b/examples/rl1/README.txt
@ -0,0 +1,9 @@
+This directory contains example scripts which read in PDFs
+and convert pages to PDF Form XObjects using pdfrw, and then
+write out the PDFs using reportlab.
+
+The examples, from easiest to hardest, are:
+
+subset.py -- prints a subset of pages
+4up.py -- prints pages 4-up
+booklet.py -- creates a booklet out of the pages
--- a/examples/rl1/booklet.py
+++ b/examples/rl1/booklet.py
@ -0,0 +1,69 @@
+#!/usr/bin/env python
+
+'''
+usage:   booklet.py my.pdf
+
+
+Uses Form XObjects and reportlab to create booklet.my.pdf.
+
+Demonstrates use of pdfrw with reportlab.
+
+'''
+
+import sys
+import os
+
+from reportlab.pdfgen.canvas import Canvas
+
+import find_pdfrw
+from pdfrw import PdfReader
+from pdfrw.buildxobj import pagexobj
+from pdfrw.toreportlab import makerl
+
+
+def read_and_double(inpfn):
+    pages = PdfReader(inpfn).pages
+    pages = [pagexobj(x) for x in pages]
+    if len(pages) & 1:
+        pages.append(pages[0])  # Sentinel -- get same size for back as front
+
+    xobjs = []
+    while len(pages) > 2:
+        xobjs.append((pages.pop(), pages.pop(0)))
+        xobjs.append((pages.pop(0), pages.pop()))
+    xobjs += [(x,) for x in pages]
+    return xobjs
+
+
+def make_pdf(outfn, xobjpairs):
+    canvas = Canvas(outfn)
+    for xobjlist in xobjpairs:
+        x = y = 0
+        for xobj in xobjlist:
+            x += xobj.BBox[2]
+            y = max(y, xobj.BBox[3])
+
+        canvas.setPageSize((x,y))
+
+        # Handle blank back page
+        if len(xobjlist) > 1 and xobjlist[0] == xobjlist[-1]:
+            xobjlist = xobjlist[:1]
+            x = xobjlist[0].BBox[2]
+        else:
+            x = 0
+        y = 0
+
+        for xobj in xobjlist:
+            canvas.saveState()
+            canvas.translate(x, y)
+            canvas.doForm(makerl(canvas, xobj))
+            canvas.restoreState()
+            x += xobj.BBox[2]
+        canvas.showPage()
+    canvas.save()
+
+
+inpfn, = sys.argv[1:]
+outfn = 'booklet.' + os.path.basename(inpfn)
+
+make_pdf(outfn, read_and_double(inpfn))
--- a/examples/rl1/find_pdfrw.py
+++ b/examples/rl1/find_pdfrw.py
@ -0,0 +1,33 @@
+'''
+    find_xxx.py -- Find the place in the tree where xxx lives.
+
+    Ways to use:
+                1) Make a copy, change 'xxx' in package to be your name; or
+                2) Under Linux, just ln -s to where this is in the right tree
+
+    Created by Pat Maupin, who doesn't consider it big enough to be worth copyrighting
+'''
+
+import sys
+import os
+
+myname = __name__[5:]   # remove 'find_'
+myname = os.path.join(myname, '__init__.py')
+
+def trypath(newpath):
+    path = None
+    while path != newpath:
+        path = newpath
+        if os.path.exists(os.path.join(path, myname)):
+            return path
+        newpath = os.path.dirname(path)
+
+root = trypath(__file__) or trypath(os.path.realpath(__file__))
+
+if root is None:
+    print
+    print 'Warning: %s: Could not find path to development package %s' % (__file__, myname)
+    print '             The import will either fail or will use system-installed libraries'
+    print
+elif root not in sys.path:
+    sys.path.append(root)
--- a/examples/rl1/platypus_pdf_template.py
+++ b/examples/rl1/platypus_pdf_template.py
@ -0,0 +1,106 @@
+#!/usr/bin/env python
+# -*- coding: utf-8 -*-
+"""
+usage: platypus_pdf_template.py output.pdf pdf_file_to_use_as_template.pdf
+
+Example of using pdfrw to use a pdf (page one) as the background for all
+other pages together with platypus.
+
+There is a table of contents in this example for completeness sake.
+
+Contributed by user asannes
+
+"""
+import sys
+
+from reportlab.platypus import PageTemplate, BaseDocTemplate, Frame
+from reportlab.platypus import NextPageTemplate, Paragraph, PageBreak
+from reportlab.platypus.tableofcontents import TableOfContents
+from reportlab.lib.styles import getSampleStyleSheet, ParagraphStyle
+from reportlab.rl_config import defaultPageSize
+from reportlab.lib.units import inch
+from reportlab.graphics import renderPDF
+
+import find_pdfrw
+from pdfrw import PdfReader
+from pdfrw.buildxobj import pagexobj
+from pdfrw.toreportlab import makerl
+
+PAGE_WIDTH = defaultPageSize[0]
+PAGE_HEIGHT = defaultPageSize[1]
+
+class MyTemplate(PageTemplate):
+    """The kernel of this example, where we use pdfrw to fill in the
+    background of a page before writing to it.  This could be used to fill
+    in a water mark or similar."""
+
+    def __init__(self, pdf_template_filename, name=None):
+        frames = [Frame(
+            0.85 * inch,
+            0.5 * inch,
+            PAGE_WIDTH - 1.15 * inch,
+            PAGE_HEIGHT - (1.5 * inch)
+            )]
+        PageTemplate.__init__(self, name, frames)
+        # use first page as template
+        page = PdfReader(pdf_template_filename).pages[0]
+        self.page_template = pagexobj(page)
+        # Scale it to fill the complete page
+        self.page_xscale = PAGE_WIDTH/self.page_template.BBox[2]
+        self.page_yscale = PAGE_HEIGHT/self.page_template.BBox[3]
+
+    def beforeDrawPage(self, canvas, doc):
+        """Draws the background before anything else"""
+        canvas.saveState()
+        rl_obj = makerl(canvas, self.page_template)
+        canvas.scale(self.page_xscale, self.page_yscale)
+        canvas.doForm(rl_obj)
+        canvas.restoreState()
+
+class MyDocTemplate(BaseDocTemplate):
+    """Used to apply heading to table of contents."""
+
+    def afterFlowable(self, flowable):
+        """Adds Heading1 to table of contents"""
+        if flowable.__class__.__name__ == 'Paragraph':
+            style = flowable.style.name
+            text = flowable.getPlainText()
+            key = '%s' % self.seq.nextf('toc')
+            if style == 'Heading1':
+                self.canv.bookmarkPage(key)
+                self.notify('TOCEntry', [1, text, self.page, key])
+
+def create_toc():
+    """Creates the table of contents"""
+    table_of_contents = TableOfContents()
+    table_of_contents.dotsMinLevel = 0
+    header1 = ParagraphStyle(name = 'Heading1', fontSize = 16, leading = 16)
+    header2 = ParagraphStyle(name = 'Heading2', fontSize = 14, leading = 14)
+    table_of_contents.levelStyles = [header1, header2]
+    return [table_of_contents, PageBreak()]
+
+def create_pdf(filename, pdf_template_filename):
+    """Create the pdf, with all the contents"""
+    pdf_report = open(filename, "w")
+    document = MyDocTemplate(pdf_report)
+    templates = [ MyTemplate(pdf_template_filename, name='background') ]
+    document.addPageTemplates(templates)
+
+    styles = getSampleStyleSheet()
+    elements = [NextPageTemplate('background')]
+    elements.extend(create_toc())
+
+    # Dummy content (hello world x 200)
+    for i in range(200):
+        elements.append(Paragraph("Hello World" + str(i), styles['Heading1']))
+
+    document.multiBuild(elements)
+    pdf_report.close()
+
+
+if __name__ == '__main__':
+    try:
+        output, template = sys.argv[1:]
+        create_pdf(output, template)
+    except ValueError:
+        print "Usage: %s <output> <template>" % (sys.argv[0])
--- a/examples/rl1/subset.py
+++ b/examples/rl1/subset.py
@ -0,0 +1,43 @@
+#!/usr/bin/env python
+
+'''
+usage:   subset.py my.pdf firstpage lastpage
+
+Creates subset_<pagenum>_to_<pagenum>.my.pdf
+
+
+Uses Form XObjects and reportlab to create output file.
+
+Demonstrates use of pdfrw with reportlab.
+
+'''
+
+import sys
+import os
+
+from reportlab.pdfgen.canvas import Canvas
+
+import find_pdfrw
+from pdfrw import PdfReader
+from pdfrw.buildxobj import pagexobj
+from pdfrw.toreportlab import makerl
+
+
+def go(inpfn, firstpage, lastpage):
+    firstpage, lastpage = int(firstpage), int(lastpage)
+    outfn = 'subset_%s_to_%s.%s' % (firstpage, lastpage, os.path.basename(inpfn))
+
+    pages = PdfReader(inpfn).pages
+    pages = [pagexobj(x) for x in pages[firstpage-1:lastpage]]
+    canvas = Canvas(outfn)
+
+    for page in pages:
+        canvas.setPageSize(tuple(page.BBox[2:]))
+        canvas.doForm(makerl(canvas, page))
+        canvas.showPage()
+
+    canvas.save()
+
+if __name__ == '__main__':
+    inpfn, firstpage, lastpage = sys.argv[1:]
+    go(inpfn, firstpage, lastpage)
--- a/examples/rl2/README.txt
+++ b/examples/rl2/README.txt
@ -0,0 +1,5 @@
+The copy.py demo in this directory parses the graphics stream from the PDF and actually plays it back through reportlab.
+
+Doesn't yet handle fonts or unicode very well.
+
+For a more practical demo, look at the Form XObjects approach in the examples/rl1 directory.
--- a/examples/rl2/copy.py
+++ b/examples/rl2/copy.py
@ -0,0 +1,32 @@
+#!/usr/bin/env python
+
+'''
+usage:   copy.py my.pdf
+
+Creates copy.my.pdf
+
+Uses somewhat-functional parser.  For better results
+for most things, see the Form XObject-based method.
+
+'''
+
+import sys
+import os
+
+from reportlab.pdfgen.canvas import Canvas
+
+from decodegraphics import parsepage
+from pdfrw import PdfReader, PdfWriter, PdfArray
+
+inpfn, = sys.argv[1:]
+outfn = 'copy.' + os.path.basename(inpfn)
+pages = PdfReader(inpfn).pages
+canvas = Canvas(outfn, pageCompression=0)
+
+for page in pages:
+    box = [float(x) for x in page.MediaBox]
+    assert box[0] == box[1] == 0, "demo won't work on this PDF"
+    canvas.setPageSize(box[2:])
+    parsepage(page, canvas)
+    canvas.showPage()
+canvas.save()
--- a/examples/rl2/decodegraphics.py
+++ b/examples/rl2/decodegraphics.py
@ -0,0 +1,378 @@
+# A part of pdfrw (pdfrw.googlecode.com)
+# Copyright (C) 2006-2009 Patrick Maupin, Austin, Texas
+# MIT license -- See LICENSE.txt for details
+
+'''
+This file is an example parser that will parse a graphics stream
+into a reportlab canvas.
+
+Needs work on fonts and unicode, but works on a few PDFs.
+
+Better to use Form XObjects for most things (see the example in rl1).
+
+'''
+from inspect import getargspec
+
+import find_pdfrw
+from pdfrw import PdfTokens
+from pdfrw.pdfobjects import PdfString
+
+#############################################################################
+# Graphics parsing
+
+def parse_array(self, token='[', params=None):
+    mylist = []
+    for token in self.tokens:
+        if token == ']':
+            break
+        mylist.append(token)
+    self.params.append(mylist)
+
+def parse_savestate(self, token='q', params=''):
+    self.canv.saveState()
+
+def parse_restorestate(self, token='Q', params=''):
+    self.canv.restoreState()
+
+def parse_transform(self, token='cm', params='ffffff'):
+    self.canv.transform(*params)
+
+def parse_linewidth(self, token='w', params='f'):
+    self.canv.setLineWidth(*params)
+
+def parse_linecap(self, token='J', params='i'):
+    self.canv.setLineCap(*params)
+
+def parse_linejoin(self, token='j', params='i'):
+    self.canv.setLineJoin(*params)
+
+def parse_miterlimit(self, token='M', params='f'):
+    self.canv.setMiterLimit(*params)
+
+def parse_dash(self, token='d', params='as'):  # Array, string
+    self.canv.setDash(*params)
+
+def parse_intent(self, token='ri', params='n'):
+    # TODO: add logging
+    pass
+
+def parse_flatness(self, token='i', params='i'):
+    # TODO: add logging
+    pass
+
+def parse_gstate(self, token='gs', params='n'):
+    # TODO: add logging
+    # Could parse stuff we care about from here later
+    pass
+
+def parse_move(self, token='m', params='ff'):
+    if self.gpath is None:
+        self.gpath = self.canv.beginPath()
+    self.gpath.moveTo(*params)
+    self.current_point = params
+
+def parse_line(self, token='l', params='ff'):
+    self.gpath.lineTo(*params)
+    self.current_point = params
+
+def parse_curve(self, token='c', params='ffffff'):
+    self.gpath.curveTo(*params)
+    self.current_point = params[-2:]
+
+def parse_curve1(self, token='v', params='ffff'):
+    parse_curve(self, token, tuple(self.current_point) + tuple(params))
+
+def parse_curve2(self, token='y', params='ffff'):
+    parse_curve(self, token, tuple(params) + tuple(params[-2:]))
+
+def parse_close(self, token='h', params=''):
+    self.gpath.close()
+
+def parse_rect(self, token='re', params='ffff'):
+    if self.gpath is None:
+        self.gpath = self.canv.beginPath()
+    self.gpath.rect(*params)
+    self.current_point = params[-2:]
+
+def parse_stroke(self, token='S', params=''):
+    finish_path(self, 1, 0, 0)
+
+def parse_close_stroke(self, token='s', params=''):
+    self.gpath.close()
+    finish_path(self, 1, 0, 0)
+
+def parse_fill(self, token='f', params=''):
+    finish_path(self, 0, 1, 1)
+
+def parse_fill_compat(self, token='F', params=''):
+    finish_path(self, 0, 1, 1)
+
+def parse_fill_even_odd(self, token='f*', params=''):
+    finish_path(self, 0, 1, 0)
+
+def parse_fill_stroke_even_odd(self, token='B*', params=''):
+    finish_path(self, 1, 1, 0)
+
+def parse_fill_stroke(self, token='B', params=''):
+    finish_path(self, 1, 1, 1)
+
+def parse_close_fill_stroke_even_odd(self, token='b*', params=''):
+    self.gpath.close()
+    finish_path(self, 1, 1, 0)
+
+def parse_close_fill_stroke(self, token='b', params=''):
+    self.gpath.close()
+    finish_path(self, 1, 1, 1)
+
+def parse_nop(self, token='n', params=''):
+    finish_path(self, 0, 0, 0)
+
+def finish_path(self, stroke, fill, fillmode):
+    if self.gpath is not None:
+        canv = self.canv
+        canv._fillMode, oldmode = fillmode, canv._fillMode
+        canv.drawPath(self.gpath, stroke, fill)
+        canv._fillMode = oldmode
+        self.gpath = None
+
+def parse_clip_path(self, token='W', params=''):
+    # TODO: add logging
+    pass
+
+def parse_clip_path_even_odd(self, token='W*', params=''):
+    # TODO: add logging
+    pass
+
+def parse_stroke_gray(self, token='G', params='f'):
+    self.canv.setStrokeGray(*params)
+
+def parse_fill_gray(self, token='g', params='f'):
+    self.canv.setFillGray(*params)
+
+def parse_stroke_rgb(self, token='RG', params='fff'):
+    self.canv.setStrokeColorRGB(*params)
+
+def parse_fill_rgb(self, token='rg', params='fff'):
+    self.canv.setFillColorRGB(*params)
+
+def parse_stroke_cmyk(self, token='K', params='ffff'):
+    self.canv.setStrokeColorCMYK(*params)
+
+def parse_fill_cmyk(self, token='k', params='ffff'):
+    self.canv.setFillColorCMYK(*params)
+
+#############################################################################
+# Text parsing
+
+def parse_begin_text(self, token='BT', params=''):
+    assert self.tpath is None
+    self.tpath = self.canv.beginText()
+
+def parse_text_transform(self, token='Tm', params='ffffff'):
+    path = self.tpath
+
+    # Stoopid optimization to remove nop
+    try:
+        code = path._code
+    except AttributeError:
+        pass
+    else:
+        if code[-1] ==  '1 0 0 1 0 0 Tm':
+            code.pop()
+
+    path.setTextTransform(*params)
+
+def parse_setfont(self, token='Tf', params='nf'):
+    fontinfo = self.fontdict[params[0]]
+    self.tpath._setFont(fontinfo.name, params[1])
+    self.curfont = fontinfo
+
+def parse_text_out(self, token='Tj', params='t'):
+    text = params[0].decode(self.curfont.remap, self.curfont.twobyte)
+    self.tpath.textOut(text)
+
+def parse_TJ(self, token='TJ', params='a'):
+    remap = self.curfont.remap
+    twobyte = self.curfont.twobyte
+    result = []
+    for x in params[0]:
+        if isinstance(x, PdfString):
+            result.append(x.decode(remap, twobyte))
+        else:
+            # TODO: Adjust spacing between characters here
+            int(x)
+    text = ''.join(result)
+    self.tpath.textOut(text)
+
+def parse_end_text(self, token='ET', params=''):
+    assert self.tpath is not None
+    self.canv.drawText(self.tpath)
+    self.tpath=None
+
+def parse_move_cursor(self, token='Td', params='ff'):
+    self.tpath.moveCursor(params[0], -params[1])
+
+def parse_set_leading(self, token='TL', params='f'):
+    self.tpath.setLeading(*params)
+
+def parse_text_line(self, token='T*', params=''):
+    self.tpath.textLine()
+
+def parse_set_char_space(self, token='Tc', params='f'):
+    self.tpath.setCharSpace(*params)
+
+def parse_set_word_space(self, token='Tw', params='f'):
+    self.tpath.setWordSpace(*params)
+
+def parse_set_hscale(self, token='Tz', params='f'):
+    self.tpath.setHorizScale(params[0] - 100)
+
+def parse_set_rise(self, token='Ts', params='f'):
+    self.tpath.setRise(*params)
+
+def parse_xobject(self, token='Do', params='n'):
+    # TODO: Need to do this
+    pass
+
+class FontInfo(object):
+    ''' Pretty basic -- needs a lot of work to work right for all fonts
+    '''
+    lookup = {
+        'BitstreamVeraSans' : 'Helvetica',   # WRONG -- have to learn about font stuff...
+             }
+
+    def __init__(self, source):
+        name = source.BaseFont[1:]
+        self.name = self.lookup.get(name, name)
+        self.remap = chr
+        self.twobyte = False
+        info = source.ToUnicode
+        if not info:
+            return
+        info = info.stream.split('beginbfchar')[1].split('endbfchar')[0]
+        info = list(PdfTokens(info))
+        assert not len(info) & 1
+        info2 = []
+        for x in info:
+            assert x[0] == '<' and x[-1] == '>' and len(x) in (4,6), x
+            i = int(x[1:-1], 16)
+            info2.append(i)
+        self.remap = dict((x,chr(y)) for (x,y) in zip(info2[::2], info2[1::2])).get
+        self.twobyte = len(info[0]) > 4
+
+#############################################################################
+# Control structures
+
+def findparsefuncs():
+    def checkname(n):
+        assert n.startswith('/')
+        return n
+
+    def checkarray(a):
+        assert isinstance(a, list), a
+        return a
+
+    def checktext(t):
+        assert isinstance(t, PdfString)
+        return t
+
+    fixparam = dict(f=float, i=int, n=checkname, a=checkarray, s=str, t=checktext)
+    fixcache = {}
+    def fixlist(params):
+        try:
+            result = fixcache[params]
+        except KeyError:
+            result = tuple(fixparam[x] for x in params)
+            fixcache[params] = result
+        return result
+
+    dispatch = {}
+    expected_args = 'self token params'.split()
+    for key, func in globals().iteritems():
+        if key.startswith('parse_'):
+            args, varargs, keywords, defaults = getargspec(func)
+            assert args == expected_args and varargs is None \
+                    and keywords is None and len(defaults) == 2, \
+                    (key, args, varargs, keywords, defaults)
+            token, params = defaults
+            if params is not None:
+                params = fixlist(params)
+            value = func, params
+            assert dispatch.setdefault(token, value) is value, repr(token)
+    return dispatch
+
+class _ParseClass(object):
+    dispatch = findparsefuncs()
+
+    @classmethod
+    def parsepage(cls, page, canvas=None):
+        self = cls()
+        contents = page.Contents
+        if contents.Filter is not None:
+            raise SystemExit('Cannot parse graphics -- page encoded with %s' % contents.Filter)
+        dispatch = cls.dispatch.get
+        self.tokens = tokens = iter(PdfTokens(contents.stream))
+        self.params = params = []
+        self.canv = canvas
+        self.gpath = None
+        self.tpath = None
+        self.fontdict = dict((x,FontInfo(y)) for (x, y) in page.Resources.Font.iteritems())
+
+        for token in self.tokens:
+            info = dispatch(token)
+            if info is None:
+                params.append(token)
+                continue
+            func, paraminfo = info
+            if paraminfo is None:
+                func(self, token, ())
+                continue
+            delta = len(params) - len(paraminfo)
+            if delta:
+                if delta < 0:
+                    print 'Operator %s expected %s parameters, got %s' % (token, len(paraminfo), params)
+                    params[:] = []
+                    continue
+                else:
+                    print "Unparsed parameters/commands:", params[:delta]
+                del params[:delta]
+            paraminfo = zip(paraminfo, params)
+            try:
+                params[:] = [x(y) for (x,y) in paraminfo]
+            except:
+                for i, (x,y) in enumerate(paraminfo):
+                    try:
+                        x(y)
+                    except:
+                        raise # For now
+                    continue
+            func(self, token, params)
+            params[:] = []
+
+def debugparser(undisturbed = set('parse_array'.split())):
+    def debugdispatch():
+        def getvalue(oldval):
+            name = oldval[0].__name__
+            def myfunc(self, token, params):
+                print '%s called %s(%s)' % (token, name, ', '.join(str(x) for x in params))
+            if name in undisturbed:
+                myfunc = oldval[0]
+            return myfunc, oldval[1]
+        return dict((x, getvalue(y)) for (x,y) in _ParseClass.dispatch.iteritems())
+
+    class _DebugParse(_ParseClass):
+        dispatch = debugdispatch()
+
+    return _DebugParse.parsepage
+
+parsepage = _ParseClass.parsepage
+
+if __name__ == '__main__':
+    import sys
+    from pdfreader import PdfReader
+    parse = debugparser()
+    fname, = sys.argv[1:]
+    pdf = PdfReader(fname)
+    for i, page in enumerate(pdf.pages):
+        print '\nPage %s ------------------------------------' % i
+        parse(page)
--- a/examples/rl2/find_pdfrw.py
+++ b/examples/rl2/find_pdfrw.py
@ -0,0 +1,33 @@
+'''
+    find_xxx.py -- Find the place in the tree where xxx lives.
+
+    Ways to use:
+                1) Make a copy, change 'xxx' in package to be your name; or
+                2) Under Linux, just ln -s to where this is in the right tree
+
+    Created by Pat Maupin, who doesn't consider it big enough to be worth copyrighting
+'''
+
+import sys
+import os
+
+myname = __name__[5:]   # remove 'find_'
+myname = os.path.join(myname, '__init__.py')
+
+def trypath(newpath):
+    path = None
+    while path != newpath:
+        path = newpath
+        if os.path.exists(os.path.join(path, myname)):
+            return path
+        newpath = os.path.dirname(path)
+
+root = trypath(__file__) or trypath(os.path.realpath(__file__))
+
+if root is None:
+    print
+    print 'Warning: %s: Could not find path to development package %s' % (__file__, myname)
+    print '             The import will either fail or will use system-installed libraries'
+    print
+elif root not in sys.path:
+    sys.path.append(root)
--- a/examples/rotate.py
+++ b/examples/rotate.py
@ -0,0 +1,41 @@
+#!/usr/bin/env python
+
+'''
+usage:   rotate.py my.pdf rotation [page[range] ...]
+         eg. rotate.py 270 1-3 5 7-9
+
+        Rotation must be multiple of 90 degrees, clockwise.
+
+Creates rotate.my.pdf with selected pages rotated.  Rotates all by default.
+
+'''
+
+import sys
+import os
+
+import find_pdfrw
+from pdfrw import PdfReader, PdfWriter
+
+inpfn = sys.argv[1]
+rotate = sys.argv[2]
+ranges = sys.argv[3:]
+
+rotate = int(rotate)
+assert rotate % 90 == 0
+
+ranges = [[int(y) for y in x.split('-')] for x in ranges]
+outfn = 'rotate.%s' % os.path.basename(inpfn)
+trailer = PdfReader(inpfn)
+pages = trailer.pages
+
+if not ranges:
+    ranges = [[1, len(pages)]]
+
+for onerange in ranges:
+    onerange = (onerange + onerange[-1:])[:2]
+    for pagenum in range(onerange[0]-1, onerange[1]):
+        pages[pagenum].Rotate = (int(pages[pagenum].inheritable.Rotate or 0) + rotate) % 360
+
+outdata = PdfWriter()
+outdata.trailer = trailer
+outdata.write(outfn)
--- a/examples/subset.py
+++ b/examples/subset.py
@ -0,0 +1,30 @@
+#!/usr/bin/env python
+
+'''
+usage:   subset.py my.pdf page[range] [page[range]] ...
+         eg. subset.py 1-3 5 7-9
+
+Creates subset.my.pdf
+
+'''
+
+import sys
+import os
+
+import find_pdfrw
+from pdfrw import PdfReader, PdfWriter
+
+inpfn = sys.argv[1]
+ranges = sys.argv[2:]
+assert ranges, "Expected at least one range"
+
+ranges = ([int(y) for y in x.split('-')] for x in ranges)
+outfn = 'subset.%s' % os.path.basename(inpfn)
+pages = PdfReader(inpfn).pages
+outdata = PdfWriter()
+
+for onerange in ranges:
+    onerange = (onerange + onerange[-1:])[:2]
+    for pagenum in range(onerange[0], onerange[1]+1):
+        outdata.addpage(pages[pagenum-1])
+outdata.write(outfn)
--- a/examples/watermark.py
+++ b/examples/watermark.py
@ -0,0 +1,114 @@
+#!/usr/bin/env python
+
+'''
+Simple example of watermarking using form xobjects (pdfrw).
+
+usage:   watermark.py my.pdf single_page.pdf
+
+Creates watermark.my.pdf, with every page overlaid with
+first page from single_page.pdf
+'''
+
+import sys
+import os
+
+import find_pdfrw
+from pdfrw import PdfReader, PdfWriter, PdfDict, PdfName, IndirectPdfDict, PdfArray
+from pdfrw.buildxobj import pagexobj
+
+def fixpage(page, watermark):
+
+    # Find the page's resource dictionary. Create if none
+    resources = page.inheritable.Resources
+    if resources is None:
+        resources = page.Resources = PdfDict()
+
+    # Find or create the parent's xobject dictionary
+    xobjdict = resources.XObject
+    if xobjdict is None:
+        xobjdict = resources.XObject = PdfDict()
+
+    # Allow for an infinite number of cascaded watermarks
+    index = 0
+    while 1:
+        watermark_name = '/Watermark.%d' % index
+        if watermark_name not in xobjdict:
+            break
+        index += 1
+    xobjdict[watermark_name] = watermark
+
+    # Turn the contents into an array if it is not already one
+    contents = page.Contents
+    if not isinstance(contents, PdfArray):
+        contents = page.Contents = PdfArray([contents])
+
+    # Save initial state before executing page
+    contents.insert(0, IndirectPdfDict(stream='q\n'))
+
+    # Restore initial state and append the watermark
+    contents.append(IndirectPdfDict(stream='Q %s Do\n' % watermark_name))
+    return page
+
+def watermark(input_fname, watermark_fname, output_fname=None):
+    outfn = output_fname or ('watermark.' + os.path.basename(input_fname))
+    w = pagexobj(PdfReader(watermark_fname).pages[0])
+    pages = PdfReader(input_fname).pages
+    PdfWriter().addpages([fixpage(x, w) for x in pages]).write(outfn)
+    return outfn
+
+def fix_pdf(fname, watermark_fname, indir, outdir):
+    from os import mkdir, path
+    if not path.exists(outdir):
+        mkdir(outdir)
+    watermark = pagexobj(PdfReader(watermark_fname).pages[0])
+    trailer = PdfReader(path.join(indir, fname))
+    for page in trailer.pages:
+        fixpage(page, watermark)
+    PdfWriter().write(path.join(outdir, fname), trailer)
+    return len(trailer.pages)
+    
+def batch_watermark(pdfdir, watermark_fname, outputdir='tmp'):
+    import traceback
+    from glob import glob
+    from os import path
+    fnames=glob(pdfdir+"/*.pdf")
+    total_pages = 0
+    good_files = 0
+    
+    for fname in fnames:
+        fname = fname.replace(pdfdir+'/','')
+        try:
+            total_pages += fix_pdf(fname, watermark_fname, pdfdir, outputdir)
+            good_files += 1
+            print "%s OK" %fname
+        except Exception:
+            print "%s Failed miserably" %fname
+            print traceback.format_exc()[:2000]
+            #raise
+    
+    print "success %.2f%% %s pages" %((float(good_files)/len(fnames))*100, total_pages)
+    
+if __name__ == "__main__":
+    
+    from optparse import OptionParser
+    parser = OptionParser(description = __doc__)
+    parser.add_option('-i', dest='input_fname', help='file name to be watermarked (pdf)')
+    parser.add_option('-w', dest='watermark_fname', help='watermark file name (pdf)')
+    parser.add_option('-d', dest='pdfdir', help='watermark all pdf files in this directory')
+    parser.add_option('-o', dest='outdir', help='outputdir used with option -d', default='tmp')
+    options, args = parser.parse_args()
+    
+    if options.input_fname and options.watermark_fname:
+        watermark = pagexobj(PdfReader(options.watermark_fname).pages[0])
+        outfn = 'watermark.' + os.path.basename(options.input_fname)
+        pages = PdfReader(options.input_fname).pages
+        
+        PdfWriter().addpages([fixpage(x, watermark) for x in pages]).write(outfn)
+    
+    elif options.pdfdir and options.watermark_fname:
+        batch_watermark(options.pdfdir, options.watermark_fname, options.outdir)
+    
+    else:
+        parser.print_help()
+        
+        
--- a/pdfrw/init.py
+++ b/pdfrw/init.py
@ -0,0 +1,16 @@
+# A part of pdfrw (pdfrw.googlecode.com)
+# Copyright (C) 2006-2012 Patrick Maupin, Austin, Texas
+# MIT license -- See LICENSE.txt for details
+
+__version__ = '0.1'
+
+from pdfrw.pdfwriter import PdfWriter
+from pdfrw.pdfreader import PdfReader
+from pdfrw.objects import PdfObject, PdfName, PdfArray, PdfDict, IndirectPdfDict, PdfString
+from pdfrw.tokens import PdfTokens
+from pdfrw.errors import PdfParseError
+
+# Add a tiny bit of compatibility to pyPdf
+
+PdfFileReader = PdfReader
+PdfFileWriter = PdfWriter
--- a/pdfrw/buildxobj.py
+++ b/pdfrw/buildxobj.py
@ -0,0 +1,249 @@
+# A part of pdfrw (pdfrw.googlecode.com)
+# Copyright (C) 2006-2012 Patrick Maupin, Austin, Texas
+# MIT license -- See LICENSE.txt for details
+
+'''
+
+This module contains code to build PDF "Form XObjects".
+
+A Form XObject allows a fragment from one PDF file to be cleanly
+included in another PDF file.
+
+Reference for syntax: "Parameters for opening PDF files" from SDK 8.1
+
+        http://www.adobe.com/devnet/acrobat/pdfs/pdf_open_parameters.pdf
+
+        supported 'page=xxx', 'viewrect=<left>,<top>,<width>,<height>'
+
+        Also supported by this, but not by Adobe:
+            'rotate=xxx'  where xxx in [0, 90, 180, 270]
+
+        Units are in points
+
+
+Reference for content:   Adobe PDF reference, sixth edition, version 1.7
+
+        http://www.adobe.com/devnet/acrobat/pdfs/pdf_reference_1-7.pdf
+
+        Form xobjects discussed chapter 4.9, page 355
+'''
+
+from pdfrw.objects import PdfDict, PdfArray, PdfName
+from pdfrw.pdfreader import PdfReader
+from pdfrw.errors import log
+
+class ViewInfo(object):
+    ''' Instantiate ViewInfo with a uri, and it will parse out
+        the filename, page, and viewrect into object attributes.
+    '''
+    doc = None
+    docname = None
+    page = None
+    viewrect = None
+    rotate = None
+
+    def __init__(self, pageinfo='', **kw):
+        pageinfo=pageinfo.split('#',1)
+        if len(pageinfo) == 2:
+            pageinfo[1:] = pageinfo[1].replace('&', '#').split('#')
+        for key in 'page viewrect'.split():
+            if pageinfo[0].startswith(key+'='):
+                break
+        else:
+            self.docname = pageinfo.pop(0)
+        for item in pageinfo:
+            key, value = item.split('=')
+            key = key.strip()
+            value = value.replace(',', ' ').split()
+            if key in ('page', 'rotate'):
+                assert len(value) == 1
+                setattr(self, key, int(value[0]))
+            elif key == 'viewrect':
+                assert len(value) == 4
+                setattr(self, key, [float(x) for x in value])
+            else:
+                log.error('Unknown option: %s', key)
+        for key, value in kw.iteritems():
+            assert hasattr(self, key), key
+            setattr(self, key, value)
+
+def get_rotation(rotate):
+    ''' Return clockwise rotation code:
+          0 = unrotated
+          1 = 90 degrees
+          2 = 180 degrees
+          3 = 270 degrees
+    '''
+    try:
+        rotate = int(rotate)
+    except (ValueError, TypeError):
+        return 0
+    if rotate % 90 != 0:
+        return 0
+    return rotate / 90
+
+def rotate_point(point, rotation):
+    ''' Rotate an (x,y) coordinate clockwise by a 
+        rotation code specifying a multiple of 90 degrees.
+    '''
+    if rotation & 1:
+        point = point[1], -point[0]
+    if rotation & 2:
+        point = -point[0], -point[1]
+    return point
+
+def rotate_rect(rect, rotation):
+    ''' Rotate both points within the rectangle, then normalize
+        the rectangle by returning the new lower left, then new
+        upper right.
+    '''
+    rect = rotate_point(rect[:2], rotation) + rotate_point(rect[2:], rotation)
+    return (min(rect[0], rect[2]), min(rect[1], rect[3]),
+            max(rect[0], rect[2]), max(rect[1], rect[3]))
+
+def getrects(inheritable, pageinfo, rotation):
+    ''' Given the inheritable attributes of a page and
+        the desired pageinfo rectangle, return the page's
+        media box and the calculated boundary (clip) box.
+    '''
+    mbox = tuple([float(x) for x in inheritable.MediaBox])
+    vrect = pageinfo.viewrect
+    if vrect is None:
+        cbox = tuple([float(x) for x in (inheritable.CropBox or mbox)])
+    else:
+        # Rotate the media box to match what the user sees,
+        # figure out the clipping box, then rotate back
+        mleft, mbot, mright, mtop = rotate_rect(mbox, rotation)
+        x, y, w, h = vrect
+        cleft = mleft + x
+        ctop = mtop - y
+        cright = cleft + w
+        cbot = ctop - h
+        cbox = max(mleft, cleft), max(mbot, cbot), min(mright, cright), min(mtop, ctop)
+        cbox = rotate_rect(cbox, -rotation)
+    return mbox, cbox
+
+
+def _cache_xobj(contents, resources, mbox, bbox, rotation):
+    ''' Return a cached Form XObject, or create a new one and cache it.
+        Adds private members x, y, w, h
+    '''
+    cachedict = contents.xobj_cachedict
+    if cachedict is None:
+        cachedict = contents.private.xobj_cachedict = {}
+    cachekey = mbox, bbox, rotation
+    result = cachedict.get(cachekey)
+    if result is None:
+        func = (_get_fullpage, _get_subpage)[mbox != bbox]
+        result = PdfDict(
+            func(contents, resources, mbox, bbox, rotation),
+            Type = PdfName.XObject,
+            Subtype = PdfName.Form,
+            FormType = 1,
+            BBox = PdfArray(bbox),
+        )
+        rect = bbox
+        if rotation:
+            matrix = rotate_point((1, 0), rotation) + rotate_point((0, 1), rotation)
+            result.Matrix = PdfArray(matrix + (0, 0))
+            rect = rotate_rect(rect, rotation)
+
+        result.private.x = rect[0]
+        result.private.y = rect[1]
+        result.private.w = rect[2] - rect[0]
+        result.private.h = rect[3] - rect[1]
+        cachedict[cachekey] = result
+    return result
+
+def _get_fullpage(contents, resources, mbox, bbox, rotation):
+    ''' fullpage is easy.  Just copy the contents,
+        set up the resources, and let _cache_xobj handle the
+        rest.
+    '''
+    return PdfDict(contents, Resources=resources)
+
+def _get_subpage(contents, resources, mbox, bbox, rotation):
+    ''' subpages *could* be as easy as full pages, but we
+        choose to complicate life by creating a Form XObject
+        for the page, and then one that references it for
+        the subpage, on the off-chance that we want multiple
+        items from the page.
+    '''
+    return PdfDict(
+        stream = '/FullPage Do\n',
+        Resources = PdfDict(
+            XObject = PdfDict(
+                FullPage = _cache_xobj(contents, resources, mbox, mbox, 0)
+            )
+        )
+    )
+
+def pagexobj(page, viewinfo=ViewInfo(), allow_compressed=True):
+    ''' pagexobj creates and returns a Form XObject for
+        a given view within a page (Defaults to entire page.)
+    '''
+    inheritable = page.inheritable
+    resources = inheritable.Resources
+    rotation = get_rotation(inheritable.Rotate)
+    mbox, bbox = getrects(inheritable, viewinfo, rotation)
+    rotation += get_rotation(viewinfo.rotate)
+    contents = page.Contents
+    # Make sure the only attribute is length
+    # All the filters must have been executed
+    assert int(contents.Length) == len(contents.stream)
+    if not allow_compressed:
+        assert len([x for x in contents.iteritems()]) == 1
+    return _cache_xobj(contents, resources, mbox, bbox, rotation)
+
+
+
+def docxobj(pageinfo, doc=None, allow_compressed=True):
+    ''' docxobj creates and returns an actual Form XObject.
+        Can work standalone, or in conjunction with
+        the CacheXObj class (below).
+    '''
+    if not isinstance(pageinfo, ViewInfo):
+        pageinfo = ViewInfo(pageinfo)
+
+    # If we're explicitly passed a document,
+    # make sure we don't have one implicitly as well.
+    # If no implicit or explicit doc, then read one in
+    # from the filename.
+    if doc is not None:
+        assert pageinfo.doc is None
+        pageinfo.doc = doc
+    elif pageinfo.doc is not None:
+        doc = pageinfo.doc
+    else:
+        doc = pageinfo.doc = PdfReader(pageinfo.docname, decompress = not allow_compressed)
+    assert isinstance(doc, PdfReader)
+
+    sourcepage = doc.pages[(pageinfo.page or 1) - 1]
+    return pagexobj(sourcepage, pageinfo, allow_compressed)
+
+
+class CacheXObj(object):
+    ''' Use to keep from reparsing files over and over,
+        and to keep from making the output too much
+        bigger than it ought to be by replicating
+        unnecessary object copies.
+    '''
+    def __init__(self, decompress=False):
+        ''' Set decompress true if you need
+            the Form XObjects to be decompressed.
+            Will decompress what it can and scream
+            about the rest.
+        '''
+        self.cached_pdfs = {}
+        self.decompress = decompress
+
+    def load(self, sourcename):
+        ''' Load a Form XObject from a uri
+        '''
+        info = ViewInfo(sourcename)
+        fname = info.docname
+        pcache = self.cached_pdfs
+        doc = pcache.get(fname)
+        if doc is None:
+            doc = pcache[fname] = PdfReader(fname, decompress=self.decompress)
+        return docxobj(info, doc, allow_compressed=not self.decompress)
--- a/pdfrw/compress.py
+++ b/pdfrw/compress.py
@ -0,0 +1,26 @@
+# A part of pdfrw (pdfrw.googlecode.com)
+# Copyright (C) 2006-2009 Patrick Maupin, Austin, Texas
+# MIT license -- See LICENSE.txt for details
+
+'''
+Currently, this sad little file only knows how to decompress
+using the flate (zlib) algorithm.  Maybe more later, but it's
+not a priority for me...
+'''
+import zlib
+from pdfrw.objects import PdfDict, PdfName
+from pdfrw.errors import log
+from pdfrw.uncompress import streamobjects
+
+def compress(mylist):
+    flate = PdfName.FlateDecode
+    for obj in streamobjects(mylist):
+        ftype = obj.Filter
+        if ftype is not None:
+            continue
+        oldstr = obj.stream
+        newstr = zlib.compress(oldstr)
+        if len(newstr) < len(oldstr) + 30:
+            obj.stream = newstr
+            obj.Filter = flate
+            obj.DecodeParms = None
--- a/pdfrw/errors.py
+++ b/pdfrw/errors.py
@ -0,0 +1,31 @@
+# A part of pdfrw (pdfrw.googlecode.com)
+# Copyright (C) 2006-2009 Patrick Maupin, Austin, Texas
+# MIT license -- See LICENSE.txt for details
+
+'''
+PDF Exceptions and error handling
+'''
+
+import logging
+from exceptions import Exception
+
+
+logging.basicConfig(
+    format='[%(levelname)s] %(filename)s:%(lineno)d %(message)s',
+    level=logging.WARNING)
+
+log = logging.getLogger('pdfrw')
+
+
+class PdfError(Exception):
+    "Abstract base class of exceptions thrown by this module"
+    def __init__(self, msg):
+        self.msg = msg
+    def __str__(self):
+        return self.msg
+
+class PdfParseError(PdfError):
+    "Error thrown by parser/tokenizer"
+
+class PdfOutputError(PdfError):
+    "Error thrown by PDF writer"
--- a/pdfrw/objects/init.py
+++ b/pdfrw/objects/init.py
@ -0,0 +1,16 @@
+# A part of pdfrw (pdfrw.googlecode.com)
+# Copyright (C) 2006-2012 Patrick Maupin, Austin, Texas
+# MIT license -- See LICENSE.txt for details
+
+'''
+Objects that can occur in PDF files.  The most important
+objects are arrays and dicts.  Either of these can be
+indirect or not, and dicts could have an associated
+stream.
+'''
+from pdfrw.objects.pdfname import PdfName
+from pdfrw.objects.pdfdict import PdfDict, IndirectPdfDict
+from pdfrw.objects.pdfarray import PdfArray
+from pdfrw.objects.pdfobject import PdfObject
+from pdfrw.objects.pdfstring import PdfString
+from pdfrw.objects.pdfindirect import PdfIndirect
--- a/pdfrw/objects/pdfarray.py
+++ b/pdfrw/objects/pdfarray.py
@ -0,0 +1,59 @@
+# A part of pdfrw (pdfrw.googlecode.com)
+# Copyright (C) 2006-2012 Patrick Maupin, Austin, Texas
+# MIT license -- See LICENSE.txt for details
+
+from pdfrw.objects.pdfindirect import PdfIndirect
+from pdfrw.objects.pdfobject import PdfObject
+
+def _resolved():
+    pass
+
+class PdfArray(list):
+    ''' A PdfArray maps the PDF file array object into a Python list.
+        It has an indirect attribute which defaults to False.
+    '''
+    indirect = False
+
+    def __init__(self, source=[]):
+        self._resolve = self._resolver
+        self.extend(source)
+
+    def _resolver(self, isinstance=isinstance, enumerate=enumerate,
+                        listiter=list.__iter__,
+                        PdfIndirect=PdfIndirect, resolved=_resolved,
+                        PdfNull=PdfObject('null')):
+        for index, value in enumerate(list.__iter__(self)):
+                if isinstance(value, PdfIndirect):
+                    value = value.real_value()
+                    if value is None:
+                        value = PdfNull
+                    self[index] = value
+        self._resolve = resolved
+
+    def __getitem__(self, index, listget=list.__getitem__):
+        self._resolve()
+        return listget(self, index)
+
+    def __getslice__(self, index, listget=list.__getslice__):
+        self._resolve()
+        return listget(self, index)
+
+    def __iter__(self, listiter=list.__iter__):
+        self._resolve()
+        return listiter(self)
+
+    def count(self, item):
+        self._resolve()
+        return list.count(self, item)
+    def index(self, item):
+        self._resolve()
+        return list.index(self, item)
+    def remove(self, item):
+        self._resolve()
+        return list.remove(self, item)
+    def sort(self, *args, **kw):
+        self._resolve()
+        return list.sort(self, *args, **kw)
+    def pop(self, *args):
+        self._resolve()
+        return list.pop(self, *args)
--- a/pdfrw/objects/pdfdict.py
+++ b/pdfrw/objects/pdfdict.py
@ -0,0 +1,205 @@
+# A part of pdfrw (pdfrw.googlecode.com)
+# Copyright (C) 2006-2012 Patrick Maupin, Austin, Texas
+# MIT license -- See LICENSE.txt for details
+
+from pdfrw.objects.pdfname import PdfName
+from pdfrw.objects.pdfindirect import PdfIndirect
+from pdfrw.objects.pdfobject import PdfObject
+
+class _DictSearch(object):
+    '''  Used to search for inheritable attributes.
+    '''
+    def __init__(self, basedict):
+        self.basedict = basedict
+    def __getattr__(self, name, PdfName=PdfName):
+        return self[PdfName(name)]
+    def __getitem__(self, name, set=set, getattr=getattr, id=id):
+        visited = set()
+        mydict = self.basedict
+        while 1:
+            value = mydict[name]
+            if value is not None:
+                return value
+            myid = id(mydict)
+            assert myid not in visited
+            visited.add(myid)
+            mydict = mydict.Parent
+            if mydict is None:
+                return
+
+class _Private(object):
+    ''' Used to store private attributes (not output to PDF files)
+        on PdfDict classes
+    '''
+    def __init__(self, pdfdict):
+        vars(self)['pdfdict'] = pdfdict
+    def __setattr__(self, name, value):
+        vars(self.pdfdict)[name] = value
+
+class PdfDict(dict):
+    ''' PdfDict objects are subclassed dictionaries with the following features:
+
+        - Every key in the dictionary starts with "/"
+
+        - A dictionary item can be deleted by assigning it to None
+
+        - Keys that (after the initial "/") conform to Python naming conventions
+          can also be accessed (set and retrieved) as attributes of the dictionary.
+          E.g.  mydict.Page is the same thing as mydict['/Page']
+
+        - Private attributes (not in the PDF space) can be set on the dictionary
+          object attribute dictionary by using the private attribute:
+
+                mydict.private.foo = 3
+                mydict.foo = 5
+                x = mydict.foo       # x will now contain 3
+                y = mydict['/foo']   # y will now contain 5
+
+          Most standard adobe dictionary keys start with an upper case letter,
+          so to avoid conflicts, it is best to start private attributes with
+          lower case letters.
+
+        - PdfDicts have the following read-only properties:
+
+            - private -- as discussed above, provides write access to dictionary's
+                         attributes
+            - inheritable -- this creates and returns a "view" attribute that
+                         will search through the object hierarchy for any desired
+                         attribute, such as /Rotate or /MediaBox
+
+        - PdfDicts also have the following special attributes:
+            - indirect is not stored in the PDF dictionary, but in the object's
+              attribute dictionary
+            - stream is also stored in the object's attribute dictionary
+              and will also update the stream length.
+            - _stream will store in the object's attribute dictionary without
+              updating the stream length.
+
+            It is possible, for example, to have a PDF name such as "/indirect"
+            or "/stream", but you cannot access such a name as an attribute:
+
+                mydict.indirect -- accesses object's attribute dictionary
+                mydict["/indirect"] -- accesses actual PDF dictionary
+    '''
+    indirect = False
+    stream = None
+
+    _special = dict(indirect = ('indirect', False),
+                    stream = ('stream', True),
+                    _stream = ('stream', False),
+                   )
+
+    def __setitem__(self, name, value, setter=dict.__setitem__):
+        assert name.startswith('/'), name
+        if value is not None:
+            setter(self, name, value)
+        elif name in self:
+            del self[name]
+
+    def __init__(self, *args, **kw):
+        if args:
+            if len(args) == 1:
+                args = args[0]
+            self.update(args)
+            if isinstance(args, PdfDict):
+                self.indirect = args.indirect
+                self._stream = args.stream
+        for key, value in kw.iteritems():
+            setattr(self, key, value)
+
+    def __getattr__(self, name, PdfName=PdfName):
+        ''' If the attribute doesn't exist on the dictionary object,
+            try to slap a '/' in front of it and get it out
+            of the actual dictionary itself.
+        '''
+        return self.get(PdfName(name))
+
+    def get(self, key, dictget=dict.get, isinstance=isinstance, PdfIndirect=PdfIndirect):
+        ''' Get a value out of the dictionary, after resolving any indirect objects.
+        '''
+        value = dictget(self, key)
+        if isinstance(value, PdfIndirect):
+            self[key] = value = value.real_value()
+        return value
+
+    def __getitem__(self, key):
+        return self.get(key)
+
+    def __setattr__(self, name, value, special=_special.get, PdfName=PdfName, vars=vars):
+        ''' Set an attribute on the dictionary.  Handle the keywords
+            indirect, stream, and _stream specially (for content objects)
+        '''
+        info = special(name)
+        if info is None:
+            self[PdfName(name)] = value
+        else:
+            name, setlen = info
+            vars(self)[name] = value
+            if setlen:
+                notnone = value is not None
+                self.Length = notnone and PdfObject(len(value)) or None
+
+    def iteritems(self, dictiter=dict.iteritems, isinstance=isinstance, PdfIndirect=PdfIndirect):
+        ''' Iterate over the dictionary, resolving any unresolved objects
+        '''
+        for key, value in list(dictiter(self)):
+            if isinstance(value, PdfIndirect):
+                self[key] = value = value.real_value()
+            if value is not None:
+                assert key.startswith('/'), (key, value)
+                yield key, value
+
+    def items(self):
+        return list(self.iteritems())
+    def itervalues(self):
+        for key, value in self.iteritems():
+            yield value
+    def values(self):
+        return list((value for key, value in self.iteritems()))
+    def keys(self):
+        return list((key for key, value in self.iteritems()))
+    def __iter__(self):
+        for key, value in self.iteritems():
+            yield key
+    def iterkeys(self):
+        return iter(self)
+
+    def copy(self):
+        return type(self)(self)
+
+    def pop(self, key):
+        value = self.get(key)
+        del self[key]
+        return value
+
+    def popitem(self):
+        key, value = dict.pop(self)
+        if isinstance(value, PdfIndirect):
+            value = value.real_value()
+        return value
+
+    def inheritable(self):
+        ''' Search through ancestors as needed for inheritable
+            dictionary items.
+            NOTE:  You might think it would be a good idea
+            to cache this class, but then you'd have to worry
+            about it pointing to the wrong dictionary if you
+            made a copy of the object...
+        '''
+        return _DictSearch(self)
+    inheritable = property(inheritable)
+
+    def private(self):
+        ''' Allows setting private metadata for use in
+            processing (not sent to PDF file).
+            See note on inheritable
+        '''
+        return _Private(self)
+    private = property(private)
+
+class IndirectPdfDict(PdfDict):
+    ''' IndirectPdfDict is a convenience class.  You could
+        create a direct PdfDict and then set indirect = True on it,
+        or you could just create an IndirectPdfDict.
+    '''
+    indirect = True
--- a/pdfrw/objects/pdfindirect.py
+++ b/pdfrw/objects/pdfindirect.py
@ -0,0 +1,20 @@
+# A part of pdfrw (pdfrw.googlecode.com)
+# Copyright (C) 2006-2012 Patrick Maupin, Austin, Texas
+# MIT license -- See LICENSE.txt for details
+
+class _NotLoaded(object):
+    pass
+
+class PdfIndirect(tuple):
+    ''' A placeholder for an object that hasn't been read in yet.
+        The object itself is the (object number, generation number) tuple.
+        The attributes include information about where the object is
+        referenced from and the file object to retrieve the real object from.
+    '''
+    value = _NotLoaded
+
+    def real_value(self, NotLoaded=_NotLoaded):
+        value = self.value
+        if value is NotLoaded:
+            value = self.value = self._loader(self)
+        return value
--- a/pdfrw/objects/pdfname.py
+++ b/pdfrw/objects/pdfname.py
@ -0,0 +1,17 @@
+# A part of pdfrw (pdfrw.googlecode.com)
+# Copyright (C) 2006-2012 Patrick Maupin, Austin, Texas
+# MIT license -- See LICENSE.txt for details
+
+from pdfrw.objects.pdfobject import PdfObject
+
+class PdfName(object):
+    ''' PdfName is a simple way to get a PDF name from a string:
+
+                PdfName.FooBar == PdfObject('/FooBar')
+    '''
+    def __getattr__(self, name):
+        return self(name)
+    def __call__(self, name, PdfObject=PdfObject):
+        return PdfObject('/' + name)
+PdfName = PdfName()
+
--- a/pdfrw/objects/pdfobject.py
+++ b/pdfrw/objects/pdfobject.py
@ -0,0 +1,10 @@
+# A part of pdfrw (pdfrw.googlecode.com)
+# Copyright (C) 2006-2012 Patrick Maupin, Austin, Texas
+# MIT license -- See LICENSE.txt for details
+
+class PdfObject(str):
+    ''' A PdfObject is a textual representation of any PDF file object
+        other than an array, dict or string. It has an indirect attribute
+        which defaults to False.
+    '''
+    indirect = False
--- a/pdfrw/objects/pdfstring.py
+++ b/pdfrw/objects/pdfstring.py
@ -0,0 +1,73 @@
+# A part of pdfrw (pdfrw.googlecode.com)
+# Copyright (C) 2006-2012 Patrick Maupin, Austin, Texas
+# MIT license -- See LICENSE.txt for details
+
+import re
+
+class PdfString(str):
+    ''' A PdfString is an encoded string.  It has a decode
+        method to get the actual string data out, and there
+        is an encode class method to create such a string.
+        Like any PDF object, it could be indirect, but it
+        defaults to being a direct object.
+    '''
+    indirect = False
+    unescape_dict = {'\\b':'\b', '\\f':'\f', '\\n':'\n',
+                     '\\r':'\r', '\\t':'\t',
+                     '\\\r\n': '', '\\\r':'', '\\\n':'',
+                     '\\\\':'\\', '\\':'',
+                    }
+    unescape_pattern = r'(\\\\|\\b|\\f|\\n|\\r|\\t|\\\r\n|\\\r|\\\n|\\[0-9]+|\\)'
+    unescape_func = re.compile(unescape_pattern).split
+
+    hex_pattern = '([a-fA-F0-9][a-fA-F0-9]|[a-fA-F0-9])'
+    hex_func = re.compile(hex_pattern).split
+
+    hex_pattern2 = '([a-fA-F0-9][a-fA-F0-9][a-fA-F0-9][a-fA-F0-9]|[a-fA-F0-9][a-fA-F0-9]|[a-fA-F0-9])'
+    hex_func2 = re.compile(hex_pattern2).split
+
+    hex_funcs = hex_func, hex_func2
+
+    def decode_regular(self, remap=chr):
+        assert self[0] == '(' and self[-1] == ')'
+        mylist = self.unescape_func(self[1:-1])
+        result = []
+        unescape = self.unescape_dict.get
+        for chunk in mylist:
+            chunk = unescape(chunk, chunk)
+            if chunk.startswith('\\') and len(chunk) > 1:
+                value = int(chunk[1:], 8)
+                # FIXME: TODO: Handle unicode here
+                if value > 127:
+                    value = 127
+                chunk = remap(value)
+            if chunk:
+                result.append(chunk)
+        return ''.join(result)
+
+    def decode_hex(self, remap=chr, twobytes=False):
+        data = ''.join(self.split())
+        data = self.hex_funcs[twobytes](data)
+        chars = data[1::2]
+        other = data[0::2]
+        assert other[0] == '<' and other[-1] == '>' and ''.join(other) == '<>', self
+        return ''.join([remap(int(x, 16)) for x in chars])
+
+    def decode(self, remap=chr, twobytes=False):
+        if self.startswith('('):
+            return self.decode_regular(remap)
+
+        else:
+            return self.decode_hex(remap, twobytes)
+
+    def encode(cls, source, usehex=False):
+        assert not usehex, "Not supported yet"
+        if isinstance(source, unicode):
+            source = source.encode('utf-8')
+        else:
+            source = str(source)
+        source = source.replace('\\', '\\\\')
+        source = source.replace('(', '\\(')
+        source = source.replace(')', '\\)')
+        return cls('(' +source + ')')
+    encode = classmethod(encode)
--- a/pdfrw/pdfreader.py
+++ b/pdfrw/pdfreader.py
@ -0,0 +1,433 @@
+# A part of pdfrw (pdfrw.googlecode.com)
+# Copyright (C) 2006-2009 Patrick Maupin, Austin, Texas
+# MIT license -- See LICENSE.txt for details
+
+'''
+The PdfReader class reads an entire PDF file into memory and
+parses the top-level container objects.  (It does not parse
+into streams.)  The object subclasses PdfDict, and the
+document pages are stored in a list in the pages attribute
+of the object.
+'''
+import gc
+
+from pdfrw.errors import PdfParseError, log
+from pdfrw.tokens import PdfTokens
+from pdfrw.objects import PdfDict, PdfArray, PdfName, PdfObject, PdfIndirect
+from pdfrw.uncompress import uncompress
+
+class PdfReader(PdfDict):
+
+    warned_bad_stream_start = False  # Use to keep from spewing warnings
+    warned_bad_stream_end = False  # Use to keep from spewing warnings
+
+    def findindirect(self, objnum, gennum, PdfIndirect=PdfIndirect, int=int):
+        ''' Return a previously loaded indirect object, or create
+            a placeholder for it.
+        '''
+        key = int(objnum), int(gennum)
+        result = self.indirect_objects.get(key)
+        if result is None:
+            self.indirect_objects[key] = result = PdfIndirect(key)
+            self.deferred_objects.add(key)
+            result._loader = self.loadindirect
+        return result
+
+    def readarray(self, source, PdfArray=PdfArray):
+        ''' Found a [ token.  Parse the tokens after that.
+        '''
+        specialget = self.special.get
+        result = []
+        pop = result.pop
+        append = result.append
+
+        for value in source:
+            if value in ']R':
+                if value == ']':
+                    break
+                generation = pop()
+                value = self.findindirect(pop(), generation)
+            else:
+                func = specialget(value)
+                if func is not None:
+                    value = func(source)
+            append(value)
+        return PdfArray(result)
+
+    def readdict(self, source, PdfDict=PdfDict):
+        ''' Found a << token.  Parse the tokens after that.
+        '''
+        specialget = self.special.get
+        result = PdfDict()
+        next = source.next
+
+        tok = next()
+        while tok != '>>':
+            if not tok.startswith('/'):
+                source.exception('Expected PDF /name object')
+            key = tok
+            value = next()
+            func = specialget(value)
+            if func is not None:
+                value = func(source)
+                tok = next()
+            else:
+                tok = next()
+                if value.isdigit() and tok.isdigit():
+                    if next() != 'R':
+                        source.exception('Expected "R" following two integers')
+                    value = self.findindirect(value, tok)
+                    tok = next()
+            result[key] = value
+        return result
+
+    def empty_obj(self, source, PdfObject=PdfObject):
+        ''' Some silly git put an empty object in the
+            file.  Back up so the caller sees the endobj.
+        '''
+        source.floc = source.tokstart
+
+    def badtoken(self, source):
+        ''' Didn't see that coming.
+        '''
+        source.exception('Unexpected delimiter')
+
+    def findstream(self, obj, tok, source, PdfDict=PdfDict, isinstance=isinstance, len=len):
+        ''' Figure out if there is a content stream
+            following an object, and return the start
+            pointer to the content stream if so.
+
+            (We can't read it yet, because we might not
+            know how long it is, because Length might
+            be an indirect object.)
+        '''
+
+        isdict = isinstance(obj, PdfDict)
+        if not isdict or tok != 'stream':
+            source.exception("Expected 'endobj'%s token", isdict and " or 'stream'" or '')
+        fdata = source.fdata
+        startstream = source.tokstart + len(tok)
+        gotcr = fdata[startstream] == '\r'
+        startstream += gotcr
+        gotlf = fdata[startstream] == '\n'
+        startstream += gotlf
+        if not gotlf:
+            if not gotcr:
+                source.exception(r'stream keyword not followed by \n')
+            if not self.warned_bad_stream_start:
+                source.warning(r"stream keyword terminated by \r without \n")
+                self.private.warned_bad_stream_start = True
+        return startstream
+
+    def readstream(self, obj, startstream, source,
+                     streamending = 'endstream endobj'.split(), int=int):
+        fdata = source.fdata
+        length =  int(obj.Length)
+        source.floc = target_endstream = startstream + length
+        endit = source.multiple(2)
+        obj._stream = fdata[startstream:target_endstream]
+        if endit == streamending:
+            return
+
+        # The length attribute does not match the distance between the
+        # stream and endstream keywords.
+
+        do_warn, self.warned_bad_stream_end = self.warned_bad_stream_end, False
+
+        #TODO:  Extract maxstream from dictionary of object offsets
+        # and use rfind instead of find.
+        maxstream = len(fdata) - 20
+        endstream = fdata.find('endstream', startstream, maxstream)
+        source.floc = startstream
+        room = endstream - startstream
+        if endstream < 0:
+            source.error('Could not find endstream')
+            return
+        if length == room + 1 and fdata[startstream-2:startstream] == '\r\n':
+            source.warning(r"stream keyword terminated by \r without \n")
+            obj._stream = fdata[startstream-1:target_endstream-1]
+            return
+        source.floc = endstream
+        if length > room:
+            source.error('stream /Length attribute (%d) appears to be too big (size %d) -- adjusting',
+                             length, room)
+            obj.stream = fdata[startstream:endstream]
+            return
+        if fdata[target_endstream:endstream].rstrip():
+            source.error('stream /Length attribute (%d) might be smaller than data size (%d)',
+                             length, room)
+            return
+        endobj = fdata.find('endobj', endstream, maxstream)
+        if endobj < 0:
+            source.error('Could not find endobj after endstream')
+            return
+        if fdata[endstream:endobj].rstrip() != 'endstream':
+            source.error('Unexpected data between endstream and endobj')
+            return
+        source.error('Illegal endstream/endobj combination')
+
+    def loadindirect(self, key):
+        result = self.indirect_objects.get(key)
+        if not isinstance(result, PdfIndirect):
+            return result
+        source = self.source
+        offset = int(self.source.obj_offsets.get(key, '0'))
+        if not offset:
+            log.warning("Did not find PDF object %s" % (key,))
+            return None
+
+        # Read the object header and validate it
+        objnum, gennum = key
+        source.floc = offset
+        objid = source.multiple(3)
+        ok = len(objid) == 3
+        ok = ok and objid[0].isdigit() and int(objid[0]) == objnum
+        ok = ok and objid[1].isdigit() and int(objid[1]) == gennum
+        ok = ok and objid[2] == 'obj'
+        if not ok:
+            source.floc = offset
+            source.next()
+            objheader = '%d %d obj' % (objnum, gennum)
+            fdata = source.fdata
+            offset2 = fdata.find('\n' + objheader) + 1 or fdata.find('\r' + objheader) + 1
+            if not offset2 or fdata.find(fdata[offset2-1] + objheader, offset2) > 0:
+                source.warning("Expected indirect object '%s'" % objheader)
+                return None
+            source.warning("Indirect object %s found at incorrect offset %d (expected offset %d)" %
+                                     (objheader, offset2, offset))
+            source.floc = offset2 + len(objheader)
+
+        # Read the object, and call special code if it starts
+        # an array or dictionary
+        obj = source.next()
+        func = self.special.get(obj)
+        if func is not None:
+            obj = func(source)
+
+        self.indirect_objects[key] = obj
+        self.deferred_objects.remove(key)
+
+        # Mark the object as indirect, and
+        # add it to the list of streams if it starts a stream
+        obj.indirect = key
+        tok = source.next()
+        if tok != 'endobj':
+            self.readstream(obj, self.findstream(obj, tok, source), source)
+        return obj
+
+    def findxref(fdata):
+        ''' Find the cross reference section at the end of a file
+        '''
+        startloc = fdata.rfind('startxref')
+        if startloc < 0:
+            raise PdfParseError('Did not find "startxref" at end of file')
+        source = PdfTokens(fdata, startloc, False)
+        tok = source.next()
+        assert tok == 'startxref'  # (We just checked this...)
+        tableloc = source.next_default()
+        if not tableloc.isdigit():
+            source.exception('Expected table location')
+        if source.next_default().rstrip().lstrip('%') != 'EOF':
+            source.exception('Expected %%EOF')
+        return startloc, PdfTokens(fdata, int(tableloc), True)
+    findxref = staticmethod(findxref)
+
+    def parsexref(self, source, int=int, range=range):
+        ''' Parse (one of) the cross-reference file section(s)
+        '''
+        fdata = source.fdata
+        setdefault = source.obj_offsets.setdefault
+        add_offset = source.all_offsets.append
+        next = source.next
+        tok = next()
+        if tok != 'xref':
+            source.exception('Expected "xref" keyword')
+        start = source.floc
+        try:
+            while 1:
+                tok = next()
+                if tok == 'trailer':
+                    return
+                startobj = int(tok)
+                for objnum in range(startobj, startobj + int(next())):
+                    offset = int(next())
+                    generation = int(next())
+                    inuse = next()
+                    if inuse == 'n':
+                        if offset != 0:
+                            setdefault((objnum, generation), offset)
+                            add_offset(offset)
+                    elif inuse != 'f':
+                        raise ValueError
+        except:
+            pass
+        try:
+        # Table formatted incorrectly.  See if we can figure it out anyway.
+            end = source.fdata.rindex('trailer', start)
+            table = source.fdata[start:end].splitlines()
+            for line in table:
+                tokens = line.split()
+                if len(tokens) == 2:
+                    objnum = int(tokens[0])
+                elif len(tokens) == 3:
+                    offset, generation, inuse = int(tokens[0]), int(tokens[1]), tokens[2]
+                    if offset != 0 and inuse == 'n':
+                        setdefault((objnum, generation), offset)
+                        add_offset(offset)
+                    objnum += 1
+                elif tokens:
+                    log.error('Invalid line in xref table: %s' % repr(line))
+                    raise ValueError
+            log.warning('Badly formatted xref table')
+            source.floc = end
+            source.next()
+        except:
+            source.floc = start
+            source.exception('Invalid table format')
+
+    def readpages(self, node):
+        pagename=PdfName.Page
+        pagesname=PdfName.Pages
+        catalogname = PdfName.Catalog
+        typename = PdfName.Type
+        kidname = PdfName.Kids
+
+        # PDFs can have arbitrarily nested Pages/Page
+        # dictionary structures.
+        def readnode(node):
+            nodetype = node[typename]
+            if nodetype == pagename:
+                yield node
+            elif nodetype == pagesname:
+                for node in node[kidname]:
+                    for node in readnode(node):
+                        yield node
+            elif nodetype == catalogname:
+                for node in readnode(node[pagesname]):
+                    yield node
+            else:
+                log.error('Expected /Page or /Pages dictionary, got %s' % repr(node))
+        try:
+            return list(readnode(node))
+        except (AttributeError, TypeError), s:
+            log.error('Invalid page tree: %s' % s)
+            return []
+
+    def __init__(self, fname=None, fdata=None, decompress=False, disable_gc=True):
+
+        # Runs a lot faster with GC off.
+        disable_gc = disable_gc and gc.isenabled()
+        try:
+            if disable_gc:
+                gc.disable()
+            if fname is not None:
+                assert fdata is None
+                # Allow reading preexisting streams like pyPdf
+                if hasattr(fname, 'read'):
+                    fdata = fname.read()
+                else:
+                    try:
+                        f = open(fname, 'rb')
+                        fdata = f.read()
+                        f.close()
+                    except IOError:
+                        raise PdfParseError('Could not read PDF file %s' % fname)
+
+            assert fdata is not None
+            if not fdata.startswith('%PDF-'):
+                startloc = fdata.find('%PDF-')
+                if startloc >= 0:
+                    log.warning('PDF header not at beginning of file')
+                else:
+                    lines = fdata.lstrip().splitlines()
+                    if not lines:
+                        raise PdfParseError('Empty PDF file!')
+                    raise PdfParseError('Invalid PDF header: %s' % repr(lines[0]))
+
+            endloc = fdata.rfind('%EOF')
+            if endloc < 0:
+                raise PdfParseError('EOF mark not found: %s' % repr(fdata[-20:]))
+            endloc += 6
+            junk = fdata[endloc:]
+            fdata = fdata[:endloc]
+            if junk.rstrip('\00').strip():
+                log.warning('Extra data at end of file')
+
+            private = self.private
+            private.indirect_objects = {}
+            private.deferred_objects = set()
+            private.special = {'<<': self.readdict,
+                               '[': self.readarray,
+                               'endobj': self.empty_obj,
+                               }
+            for tok in r'\ ( ) < > { } ] >> %'.split():
+                self.special[tok] = self.badtoken
+
+
+            startloc, source = self.findxref(fdata)
+            private.source = source
+            xref_table_list = []
+            source.all_offsets = []
+            while 1:
+                source.obj_offsets = {}
+                # Loop through all the cross-reference tables
+                self.parsexref(source)
+                tok = source.next()
+                if tok != '<<':
+                    source.exception('Expected "<<" starting catalog')
+
+                newdict = self.readdict(source)
+
+                token = source.next()
+                if token != 'startxref' and not xref_table_list:
+                    source.warning('Expected "startxref" at end of xref table')
+
+                # Loop if any previously-written tables.
+                prev = newdict.Prev
+                if prev is None:
+                    break
+                if not xref_table_list:
+                    newdict.Prev = None
+                    original_indirect = self.indirect_objects.copy()
+                    original_newdict = newdict
+                source.floc = int(prev)
+                xref_table_list.append(source.obj_offsets)
+                self.indirect_objects.clear()
+
+            if xref_table_list:
+                for update in reversed(xref_table_list):
+                    source.obj_offsets.update(update)
+                self.indirect_objects.clear()
+                self.indirect_objects.update(original_indirect)
+                newdict = original_newdict
+            self.update(newdict)
+
+            #self.read_all_indirect(source)
+            private.pages = self.readpages(self.Root)
+            if decompress:
+                self.uncompress()
+
+            # For compatibility with pyPdf
+            private.numPages = len(self.pages)
+        finally:
+            if disable_gc:
+                gc.enable()
+
+    # For compatibility with pyPdf
+    def getPage(self, pagenum):
+        return self.pages[pagenum]
+
+    def read_all(self):
+        deferred = self.deferred_objects
+        prev = set()
+        while 1:
+            new = deferred - prev
+            if not new:
+                break
+            prev |= deferred
+            for key in new:
+                self.loadindirect(key)
+
+    def uncompress(self):
+        self.read_all()
+        uncompress(self.indirect_objects.itervalues())
--- a/pdfrw/pdfwriter.py
+++ b/pdfrw/pdfwriter.py
@ -0,0 +1,295 @@
+#!/usr/bin/env python
+
+# A part of pdfrw (pdfrw.googlecode.com)
+# Copyright (C) 2006-2009 Patrick Maupin, Austin, Texas
+# MIT license -- See LICENSE.txt for details
+
+'''
+The PdfWriter class writes an entire PDF file out to disk.
+
+The writing process is not at all optimized or organized.
+
+An instance of the PdfWriter class has two methods:
+    addpage(page)
+and
+    write(fname)
+
+addpage() assumes that the pages are part of a valid
+tree/forest of PDF objects.
+'''
+
+try:
+    set
+except NameError:
+    from sets import Set as set
+
+from pdfrw.objects import PdfName, PdfArray, PdfDict, IndirectPdfDict, PdfObject, PdfString
+from pdfrw.compress import compress as do_compress
+from pdfrw.errors import PdfOutputError, log
+
+NullObject = PdfObject('null')
+NullObject.indirect = True
+NullObject.Type = 'Null object'
+
+def FormatObjects(f, trailer, version='1.3', compress=True, killobj=(),
+        id=id, isinstance=isinstance, getattr=getattr,len=len,
+        sum=sum, set=set, str=str, basestring=basestring,
+        hasattr=hasattr, repr=repr, enumerate=enumerate,
+        list=list, dict=dict, tuple=tuple,
+        do_compress=do_compress, PdfArray=PdfArray,
+        PdfDict=PdfDict, PdfObject=PdfObject, encode=PdfString.encode):
+    ''' FormatObjects performs the actual formatting and disk write.
+        Should be a class, was a class, turned into nested functions
+        for performace (to reduce attribute lookups).
+    '''
+
+    def add(obj):
+        ''' Add an object to our list, if it's an indirect
+            object.  Just format it if not.
+        '''
+        # Can't hash dicts, so just hash the object ID
+        objid = id(obj)
+
+        # Automatically set stream objects to indirect
+        if isinstance(obj, PdfDict):
+            indirect = obj.indirect or (obj.stream is not None)
+        else:
+            indirect = getattr(obj, 'indirect', False)
+
+        if not indirect:
+            if objid in visited:
+                log.warning('Replicating direct %s object, should be indirect for optimal file size' % type(obj))
+                obj = type(obj)(obj)
+                objid = id(obj)
+            visiting(objid)
+            result = format_obj(obj)
+            leaving(objid)
+            return result
+
+        objnum = indirect_dict_get(objid)
+
+        # If we haven't seen the object yet, we need to
+        # add it to the indirect object list.
+        if objnum is None:
+            swapped = swapobj(objid)
+            if swapped is not None:
+                old_id = objid
+                obj = swapped
+                objid = id(obj)
+                objnum = indirect_dict_get(objid)
+                if objnum is not None:
+                    indirect_dict[old_id] = objnum
+                    return '%s 0 R' % objnum
+            objnum = len(objlist) + 1
+            objlist_append(None)
+            indirect_dict[objid] = objnum
+            deferred.append((objnum-1, obj))
+        return '%s 0 R' % objnum
+
+    def format_array(myarray, formatter):
+        # Format array data into semi-readable ASCII
+        if sum([len(x) for x in myarray]) <= 70:
+            return formatter % space_join(myarray)
+        return format_big(myarray, formatter)
+
+    def format_big(myarray, formatter):
+        bigarray = []
+        count = 1000000
+        for x in myarray:
+            lenx = len(x) + 1
+            count += lenx
+            if count > 71:
+                subarray = []
+                bigarray.append(subarray)
+                count = lenx
+            subarray.append(x)
+        return formatter % lf_join([space_join(x) for x in bigarray])
+
+    def format_obj(obj):
+        ''' format PDF object data into semi-readable ASCII.
+            May mutually recurse with add() -- add() will
+            return references for indirect objects, and add
+            the indirect object to the list.
+        '''
+        while 1:
+            if isinstance(obj, (list, dict, tuple)):
+                if isinstance(obj, PdfArray):
+                    myarray = [add(x) for x in obj]
+                    return format_array(myarray, '[%s]')
+                elif isinstance(obj, PdfDict):
+                    if compress and obj.stream:
+                        do_compress([obj])
+                    myarray = []
+                    dictkeys = [str(x) for x in obj.keys()]
+                    dictkeys.sort()
+                    for key in dictkeys:
+                        myarray.append(key)
+                        myarray.append(add(obj[key]))
+                    result = format_array(myarray, '<<%s>>')
+                    stream = obj.stream
+                    if stream is not None:
+                        result = '%s\nstream\n%s\nendstream' % (result, stream)
+                    return result
+                obj = (PdfArray, PdfDict)[isinstance(obj, dict)](obj)
+                continue
+
+            if not hasattr(obj, 'indirect') and isinstance(obj, basestring):
+                return encode(obj)
+            return str(getattr(obj, 'encoded', obj))
+
+    def format_deferred():
+        while deferred:
+            index, obj = deferred.pop()
+            objlist[index] = format_obj(obj)
+
+
+    indirect_dict = {}
+    indirect_dict_get = indirect_dict.get
+    objlist = []
+    objlist_append = objlist.append
+    visited = set()
+    visiting = visited.add
+    leaving = visited.remove
+    space_join = ' '.join
+    lf_join = '\n  '.join
+    f_write = f.write
+
+    deferred = []
+
+    # Don't reference old catalog or pages objects -- swap references to new ones.
+    swapobj = {PdfName.Catalog:trailer.Root, PdfName.Pages:trailer.Root.Pages, None:trailer}.get
+    swapobj = [(objid, swapobj(obj.Type)) for objid, obj in killobj.iteritems()]
+    swapobj = dict((objid, obj is None and NullObject or obj) for objid, obj in swapobj).get
+
+    for objid in killobj:
+        assert swapobj(objid) is not None
+
+    # The first format of trailer gets all the information,
+    # but we throw away the actual trailer formatting.
+    format_obj(trailer)
+    # Keep formatting until we're done.
+    # (Used to recurse inside format_obj for this, but
+    #  hit system limit.)
+    format_deferred()
+    # Now we know the size, so we update the trailer dict
+    # and get the formatted data.
+    trailer.Size = PdfObject(len(objlist) + 1)
+    trailer = format_obj(trailer)
+
+    # Now we have all the pieces to write out to the file.
+    # Keep careful track of the counts while we do it so
+    # we can correctly build the cross-reference.
+
+    header = '%%PDF-%s\n%%\xe2\xe3\xcf\xd3\n' % version
+    f_write(header)
+    offset = len(header)
+    offsets = [(0, 65535, 'f')]
+    offsets_append = offsets.append
+
+    for i, x in enumerate(objlist):
+        objstr = '%s 0 obj\n%s\nendobj\n' % (i + 1, x)
+        offsets_append((offset, 0, 'n'))
+        offset += len(objstr)
+        f_write(objstr)
+
+    f_write('xref\n0 %s\n' % len(offsets))
+    for x in offsets:
+        f_write('%010d %05d %s\r\n' % x)
+    f_write('trailer\n\n%s\nstartxref\n%s\n%%%%EOF\n' % (trailer, offset))
+
+class PdfWriter(object):
+
+    _trailer = None
+
+    def __init__(self, version='1.3', compress=False):
+        self.pagearray = PdfArray()
+        self.compress = compress
+        self.version = version
+        self.killobj = {}
+
+    def addpage(self, page):
+        self._trailer = None
+        if page.Type != PdfName.Page:
+            raise PdfOutputError('Bad /Type:  Expected %s, found %s'
+                                  % (PdfName.Page, page.Type))
+        inheritable = page.inheritable # searches for resources
+        self.pagearray.append(
+            IndirectPdfDict(
+                page,
+                Resources = inheritable.Resources,
+                MediaBox = inheritable.MediaBox,
+                CropBox = inheritable.CropBox,
+                Rotate = inheritable.Rotate,
+            )
+        )
+
+        # Add parents in the hierarchy to objects we
+        # don't want to output
+        killobj = self.killobj
+        obj = page.Parent
+        while obj is not None:
+            objid = id(obj)
+            if objid in killobj:
+                break
+            killobj[objid] = obj
+            obj = obj.Parent
+        return self
+
+    addPage = addpage  # for compatibility with pyPdf
+
+    def addpages(self, pagelist):
+        for page in pagelist:
+            self.addpage(page)
+        return self
+
+    def _get_trailer(self):
+        trailer = self._trailer
+        if trailer is not None:
+            return trailer
+
+        # Create the basic object structure of the PDF file
+        trailer = PdfDict(
+            Root = IndirectPdfDict(
+                Type = PdfName.Catalog,
+                Pages = IndirectPdfDict(
+                    Type = PdfName.Pages,
+                    Count = PdfObject(len(self.pagearray)),
+                    Kids = self.pagearray
+                )
+            )
+        )
+        # Make all the pages point back to the page dictionary
+        pagedict = trailer.Root.Pages
+        for page in pagedict.Kids:
+            page.Parent = pagedict
+        self._trailer = trailer
+        return trailer
+
+    def _set_trailer(self, trailer):
+        self._trailer = trailer
+
+    trailer = property(_get_trailer, _set_trailer)
+
+    def write(self, fname, trailer=None):
+        trailer = trailer or self.trailer
+
+        # Dump the data.  We either have a filename or a preexisting
+        # file object.
+        preexisting = hasattr(fname, 'write')
+        f = preexisting and fname or open(fname, 'wb')
+        FormatObjects(f, trailer, self.version, self.compress, self.killobj)
+        if not preexisting:
+            f.close()
+
+if __name__ == '__main__':
+    import logging
+    log.setLevel(logging.DEBUG)
+    import pdfreader
+    x = pdfreader.PdfReader('source.pdf')
+    y = PdfWriter()
+    for i, page in enumerate(x.pages):
+        print '  Adding page', i+1, '\r',
+        y.addpage(page)
+    print
+    y.write('result.pdf')
+    print
--- a/pdfrw/tokens.py
+++ b/pdfrw/tokens.py
@ -0,0 +1,228 @@
+# A part of pdfrw (pdfrw.googlecode.com)
+# Copyright (C) 2006-2012 Patrick Maupin, Austin, Texas
+# MIT license -- See LICENSE.txt for details
+
+'''
+A tokenizer for PDF streams.
+
+In general, documentation used was "PDF reference",
+sixth edition, for PDF version 1.7, dated November 2006.
+
+'''
+
+from __future__ import generators
+
+import re
+import itertools
+from pdfrw.objects import PdfString, PdfObject
+from pdfrw.errors import log, PdfParseError
+
+def linepos(fdata, loc):
+    line = fdata.count('\n', 0, loc) + 1
+    line += fdata.count('\r', 0, loc) - fdata.count('\r\n', 0, loc)
+    col = loc - max(fdata.rfind('\n', 0, loc), fdata.rfind('\r', 0, loc))
+    return line, col
+
+class PdfTokens(object):
+
+    # Table 3.1, page 50 of reference, defines whitespace
+    eol = '\n\r'
+    whitespace = '\x00 \t\f' + eol
+
+    # Text on page 50 defines delimiter characters
+    # Escape the ]
+    delimiters = r'()<>{}[\]/%'
+
+    # "normal" stuff is all but delimiters or whitespace.
+
+    p_normal = r'(?:[^\\%s%s]+|\\[^%s])+' % (whitespace, delimiters, whitespace)
+
+    p_comment = r'\%%[^%s]*' % eol
+
+    # This will get the bulk of literal strings.
+    p_literal_string = r'\((?:[^\\()]+|\\.)*[()]?'
+
+    # This will get more pieces of literal strings
+    # (Don't ask me why, but it hangs without the trailing ?.)
+    p_literal_string_extend = r'(?:[^\\()]+|\\.)*[()]?'
+
+    # A hex string.  This one's easy.
+    p_hex_string = r'\<[%s0-9A-Fa-f]*\>' % whitespace
+
+    p_dictdelim = r'\<\<|\>\>'
+    p_name = r'/[^%s%s]*' % (delimiters, whitespace)
+
+    p_catchall = '[^%s]' % whitespace
+
+    pattern = '|'.join([p_normal, p_name, p_hex_string, p_dictdelim, p_literal_string, p_comment, p_catchall])
+    findtok = re.compile('(%s)[%s]*' % (pattern, whitespace), re.DOTALL).finditer
+    findparen = re.compile('(%s)[%s]*' % (p_literal_string_extend, whitespace), re.DOTALL).finditer
+    splitname = re.compile(r'\#([0-9A-Fa-f]{2})').split
+
+    def _cacheobj(cache, obj, constructor):
+        ''' This caching relies on the constructors
+            returning something that will compare as
+            equal to the original obj.  This works
+            fine with our PDF objects.
+        '''
+        result = cache.get(obj)
+        if result is None:
+            result = constructor(obj)
+            cache[result] = result
+        return result
+
+    def fixname(self, cache, token, constructor, splitname=splitname, join=''.join, cacheobj=_cacheobj):
+        ''' Inside name tokens, a '#' character indicates that
+            the next two bytes are hex characters to be used
+            to form the 'real' character.
+        '''
+        substrs = splitname(token)
+        if '#' in join(substrs[::2]):
+            self.warning('Invalid /Name token')
+            return token
+        substrs[1::2] = (chr(int(x, 16)) for x in substrs[1::2])
+        result = cacheobj(cache, join(substrs), constructor)
+        result.encoded = token
+        return result
+
+    def _gettoks(self, startloc, cacheobj=_cacheobj,
+                       delimiters=delimiters, findtok=findtok, findparen=findparen,
+                       PdfString=PdfString, PdfObject=PdfObject):
+        ''' Given a source data string and a location inside it,
+            gettoks generates tokens.  Each token is a tuple of the form:
+             <starting file loc>, <ending file loc>, <token string>
+            The ending file loc is past any trailing whitespace.
+
+            The main complication here is the literal strings, which
+            can contain nested parentheses.  In order to cope with these
+            we can discard the current iterator and loop back to the
+            top to get a fresh one.
+
+            We could use re.search instead of re.finditer, but that's slower.
+        '''
+        fdata = self.fdata
+        current = self.current = [(startloc, startloc)]
+        namehandler = (cacheobj, self.fixname)
+        cache = {}
+        while 1:
+            for match in findtok(fdata, current[0][1]):
+                current[0] = tokspan = match.span()
+                token = match.group(1)
+                firstch = token[0]
+                if firstch not in delimiters:
+                    token = cacheobj(cache, token, PdfObject)
+                elif firstch in '/<(%':
+                    if firstch == '/':
+                        # PDF Name
+                        token = namehandler['#' in token](cache, token, PdfObject)
+                    elif firstch == '<':
+                        # << dict delim, or < hex string >
+                        if token[1:2] != '<':
+                            token = cacheobj(cache, token, PdfString)
+                    elif firstch == '(':
+                        # Literal string
+                        # It's probably simple, but maybe not
+                        # Nested parentheses are a bear, and if
+                        # they are present, we exit the for loop
+                        # and get back in with a new starting location.
+                        ends = None  # For broken strings
+                        if fdata[match.end(1)-1] != ')':
+                            nest = 2
+                            m_start, loc = tokspan
+                            for match in findparen(fdata, loc):
+                                loc = match.end(1)
+                                ending = fdata[loc-1] == ')'
+                                nest += 1 - ending * 2
+                                if not nest:
+                                    break
+                                if ending and ends is None:
+                                    ends = loc, match.end(), nest
+                            token = fdata[m_start:loc]
+                            current[0] = m_start, match.end()
+                            if nest:
+                                # There is one possible recoverable error seen in
+                                # the wild -- some stupid generators don't escape (.
+                                # If this happens, just terminate on first unescaped ).
+                                # The string won't be quite right, but that's a science
+                                # fair project for another time.
+                                (self.error, self.exception)[not ends]('Unterminated literal string')
+                                loc, ends, nest = ends
+                                token = fdata[m_start:loc] + ')' * nest
+                                current[0] = m_start, ends
+                        token = cacheobj(cache, token, PdfString)
+                    elif firstch == '%':
+                        # Comment
+                        if self.strip_comments:
+                            continue
+                    else:
+                        self.exception('Tokenizer logic incorrect -- should never get here')
+
+                yield token
+                if current[0] is not tokspan:
+                    break
+            else:
+                if self.strip_comments:
+                    break
+                raise StopIteration
+
+    def __init__(self, fdata, startloc=0, strip_comments=True):
+        self.fdata = fdata
+        self.strip_comments = strip_comments
+        self.iterator = iterator = self._gettoks(startloc)
+        self.next = iterator.next
+
+    def setstart(self, startloc):
+        ''' Change the starting location.
+        '''
+        current = self.current
+        if startloc != current[0][1]:
+            current[0] = startloc, startloc
+
+    def floc(self):
+        ''' Return the current file position
+            (where the next token will be retrieved)
+        '''
+        return self.current[0][1]
+    floc = property(floc, setstart)
+
+    def tokstart(self):
+        ''' Return the file position of the most
+            recently retrieved token.
+        '''
+        return self.current[0][0]
+    tokstart = property(tokstart, setstart)
+
+    def __iter__(self):
+        return self.iterator
+
+    def multiple(self, count, islice=itertools.islice, list=list):
+        ''' Retrieve multiple tokens
+        '''
+        return list(islice(self, count))
+
+    def next_default(self, default='nope'):
+        for result in self:
+            return result
+        return default
+
+    def msg(self, msg, *arg):
+        if arg:
+            msg %= arg
+        fdata = self.fdata
+        begin, end = self.current[0]
+        line, col = linepos(fdata, begin)
+        if end > begin:
+            tok = fdata[begin:end].rstrip()
+            if len(tok) > 30:
+                tok = tok[:26] + ' ...'
+            return '%s (line=%d, col=%d, token=%s)' % (msg, line, col, repr(tok))
+        return '%s (line=%d, col=%d)' % (msg, line, col)
+
+    def warning(self, *arg):
+        log.warning(self.msg(*arg))
+
+    def error(self, *arg):
+        log.error(self.msg(*arg))
+
+    def exception(self, *arg):
+        raise PdfParseError(self.msg(*arg))
--- a/pdfrw/toreportlab.py
+++ b/pdfrw/toreportlab.py
@ -0,0 +1,139 @@
+# A part of pdfrw (pdfrw.googlecode.com)
+# Copyright (C) 2006-2009 Patrick Maupin, Austin, Texas
+# MIT license -- See LICENSE.txt for details
+
+'''
+Converts pdfrw objects into reportlab objects.
+
+Designed for and tested with rl 2.3.
+
+Knows too much about reportlab internals.
+What can you do?
+
+The interface to this function is through the makerl() function.
+
+Parameters:
+        canv       - a reportlab "canvas" (also accepts a "document")
+        pdfobj      - a pdfrw PDF object
+
+Returns:
+        A corresponding reportlab object, or if the
+        object is a PDF Form XObject, the name to
+        use with reportlab for the object.
+
+        Will recursively convert all necessary objects.
+        Be careful when converting a page -- if /Parent is set,
+        will recursively convert all pages!
+
+Notes:
+    1) Original objects are annotated with a
+        derived_rl_obj attribute which points to the
+        reportlab object.  This keeps multiple reportlab
+        objects from being generated for the same pdfobj
+        via repeated calls to makerl.  This is great for
+        not putting too many objects into the
+        new PDF, but not so good if you are modifying
+        objects for different pages.  Then you
+        need to do your own deep copying (of circular
+        structures).  You're on your own.
+
+    2) ReportLab seems weird about FormXObjects.
+       They pass around a partial name instead of the
+       object or a reference to it.  So we have to
+       reach into reportlab and get a number for
+       a unique name.  I guess this is to make it
+       where you can combine page streams with
+       impunity, but that's just a guess.
+
+    3) Updated 1/23/2010 to handle multipass documents
+       (e.g. with a table of contents).  These have
+       a different doc object on every pass.
+
+'''
+
+from reportlab.pdfbase import pdfdoc as rldocmodule
+from pdfrw.objects import PdfDict, PdfArray, PdfName
+
+RLStream = rldocmodule.PDFStream
+RLDict = rldocmodule.PDFDictionary
+RLArray = rldocmodule.PDFArray
+
+
+def _makedict(rldoc, pdfobj):
+    rlobj = rldict = RLDict()
+    if pdfobj.indirect:
+        rlobj.__RefOnly__ = 1
+        rlobj = rldoc.Reference(rlobj)
+    pdfobj.derived_rl_obj[rldoc] = rlobj, None
+
+    for key, value in pdfobj.iteritems():
+        rldict[key[1:]] = makerl_recurse(rldoc, value)
+
+    return rlobj
+
+def _makestream(rldoc, pdfobj, xobjtype=PdfName.XObject):
+    rldict = RLDict()
+    rlobj = RLStream(rldict, pdfobj.stream)
+
+    if pdfobj.Type == xobjtype:
+        shortname = 'pdfrw_%s' % (rldoc.objectcounter+1)
+        fullname = rldoc.getXObjectName(shortname)
+    else:
+        shortname = fullname = None
+    result = rldoc.Reference(rlobj, fullname)
+    pdfobj.derived_rl_obj[rldoc] = result, shortname
+
+    for key, value in pdfobj.iteritems():
+        rldict[key[1:]] = makerl_recurse(rldoc, value)
+
+    return result
+
+def _makearray(rldoc, pdfobj):
+    rlobj = rlarray = RLArray([])
+    if pdfobj.indirect:
+        rlobj.__RefOnly__ = 1
+        rlobj = rldoc.Reference(rlobj)
+    pdfobj.derived_rl_obj[rldoc] = rlobj, None
+
+    mylist = rlarray.sequence
+    for value in pdfobj:
+        mylist.append(makerl_recurse(rldoc, value))
+
+    return rlobj
+
+def _makestr(rldoc, pdfobj):
+    assert isinstance(pdfobj, (float, int, str)), repr(pdfobj)
+    return pdfobj
+
+def makerl_recurse(rldoc, pdfobj):
+    docdict = getattr(pdfobj, 'derived_rl_obj', None)
+    if docdict is not None:
+        value = docdict.get(rldoc)
+        if value is not None:
+            return value[0]
+    if isinstance(pdfobj, PdfDict):
+        if pdfobj.stream is not None:
+            func = _makestream
+        else:
+            func = _makedict
+        if docdict is None:
+            pdfobj.private.derived_rl_obj = {}
+    elif isinstance(pdfobj, PdfArray):
+        func = _makearray
+        if docdict is None:
+            pdfobj.derived_rl_obj = {}
+    else:
+        func = _makestr
+    return func(rldoc, pdfobj)
+
+def makerl(canv, pdfobj):
+    try:
+        rldoc = canv._doc
+    except AttributeError:
+        rldoc = canv
+    rlobj = makerl_recurse(rldoc, pdfobj)
+    try:
+        name = pdfobj.derived_rl_obj[rldoc][1]
+    except AttributeError:
+        name = None
+    return name or rlobj
--- a/pdfrw/uncompress.py
+++ b/pdfrw/uncompress.py
@ -0,0 +1,52 @@
+# A part of pdfrw (pdfrw.googlecode.com)
+# Copyright (C) 2006-2009 Patrick Maupin, Austin, Texas
+# MIT license -- See LICENSE.txt for details
+
+'''
+Currently, this sad little file only knows how to decompress
+using the flate (zlib) algorithm.  Maybe more later, but it's
+not a priority for me...
+'''
+import zlib
+from pdfrw.objects import PdfDict, PdfName
+from pdfrw.errors import log
+
+def streamobjects(mylist, isinstance=isinstance, PdfDict=PdfDict):
+    for obj in mylist:
+        if isinstance(obj, PdfDict) and obj.stream is not None:
+            yield obj
+
+def uncompress(mylist, warnings=set(), flate = PdfName.FlateDecode,
+                    decompress=zlib.decompressobj, isinstance=isinstance, list=list, len=len):
+    ok = True
+    for obj in streamobjects(mylist):
+        ftype = obj.Filter
+        if ftype is None:
+            continue
+        if isinstance(ftype, list) and len(ftype) == 1:
+            # todo: multiple filters
+            ftype = ftype[0]
+        parms = obj.DecodeParms
+        if ftype != flate or parms is not None:
+            msg = 'Not decompressing: cannot use filter %s with parameters %s' % (repr(ftype), repr(parms))
+            if msg not in warnings:
+                warnings.add(msg)
+                log.warning(msg)
+            ok = False
+        else:
+            dco = decompress()
+            error = None
+            try:
+                data = dco.decompress(obj.stream)
+            except Exception, s:
+                error = str(s)
+            if error is None:
+                assert not dco.unconsumed_tail
+                if dco.unused_data.strip():
+                    error = 'Unconsumed compression data: %s' % repr(dco.unused_data[:20])
+            if error is None:
+                obj.Filter = None
+                obj.stream = data
+            else:
+                log.error('%s %s' % (error, repr(obj.indirect)))
+    return ok
--- a/setup.py
+++ b/setup.py
@ -0,0 +1,38 @@
+#!/usr/bin/env python
+
+from distutils.core import setup
+
+setup(
+    name='pdfrw',
+    version='0.1',
+    description='PDF file reader/writer library',
+    long_description='''
+pdfrw lets you read and write PDF files, including
+compositing multiple pages together (e.g. to do watermarking,
+or to copy an image or diagram from one PDF to another),
+and can output by itself, or in conjunction with reportlab.
+
+pdfrw will faithfully reproduce vector formats without
+rasterization, so the rst2pdf package has used pdfrw
+by default for PDF and SVG images by default since
+March 2010.  Several small examples are provided.
+''',
+    author='Patrick Maupin',
+    author_email='pmaupin@gmail.com',
+    platforms="Independent",
+    url='http://code.google.com/p/pdfrw/',
+    packages=['pdfrw', 'pdfrw.objects'],
+    license="MIT",
+    classifiers=[
+        'Development Status :: 4 - Beta',
+        'Environment :: Console',
+        'Intended Audience :: Developers',
+        'License :: OSI Approved :: MIT License',
+        'Operating System :: OS Independent',
+        'Programming Language :: Python',
+        'Topic :: Multimedia :: Graphics :: Graphics Conversion',
+        'Topic :: Software Development :: Libraries',
+        'Topic :: Utilities'
+    ],
+    keywords='pdf vector graphics',
+)
--- a/tests/init.py
+++ b/tests/init.py
@ -0,0 +1 @@
+# This file intentionally left blank.
--- a/tests/test_pdfstring.py
+++ b/tests/test_pdfstring.py
@ -0,0 +1,37 @@
+'''
+Run from the directory above like so:
+python -m tests.test_pdfstring
+'''
+
+
+import pdfrw
+import unittest
+
+
+class TestEncoding(unittest.TestCase):
+
+    @staticmethod
+    def decode(value):
+        return pdfrw.pdfobjects.PdfString(value).decode()
+
+    @staticmethod
+    def encode(value):
+        return str(pdfrw.pdfobjects.PdfString.encode(value))
+
+    @classmethod
+    def encode_decode(cls, value):
+        return cls.decode(cls.encode(value))
+
+    def roundtrip(self, value):
+        self.assertEqual(value, self.encode_decode(value))
+
+    def test_doubleslash(self):
+        self.roundtrip('\\')
+
+
+def main():
+    unittest.main()
+
+
+if __name__ == '__main__':
+    main()