Commit Graph

430 Commits

Author SHA1 Message Date
Matthew Stamy fcd0ac0192 Merge pull request #238 from im2703/master
Pageobject.mergePage() fails for some pages with images (Python3)
2016-04-17 21:44:59 -05:00
Matthew Stamy b9948ff393 Merge pull request #253 from VyacheslavHashov/master
Python 3 compatibility with inline images
2016-04-17 21:43:15 -05:00
Matthew Stamy 8ca3a9b17a Merge pull request #258 from BenRussert/master
Ignore xref table zero index error if self.strict = False
2016-04-17 21:33:26 -05:00
Matthew Stamy 531f8ac4de Merge pull request #257 from GuruLabs/guru/createpdf_invalid_ref
Added a fallback for creating PDFs that have invalidly referenced objects (provides a fix for broken PDF objects)
2016-04-17 21:26:53 -05:00
BenRussert c3b1ac2f2a Ignore xref table zero index error if self.strict = False 2016-04-08 21:27:32 -05:00
Rob Oakes 7ca8fc6de5 Merge branch 'master' of https://github.com/mstamy2/PyPDF2 into guru/createpdf_invalid_ref 2016-04-06 12:50:10 -06:00
ctate 231aaf68dd Working around unresolved objects and returning NullObject instead of raising a ValueError. 2016-04-08 09:04:49 -06:00
VyacheslavHashov c15eadb25c Python 3 compatibility with inline images 2016-03-14 19:33:06 +03:00
Sylvain Pelissier b0ace625c1 Correct test for python3 2016-01-21 13:48:11 +01:00
Sylvain Pelissier 1273824c0f Add CCITTFax Decode and JPEG test 2016-01-21 13:42:17 +01:00
Sylvain Pelissier efae6bcae6 Update README.md
Travis CI picture.
2016-01-13 16:37:53 +01:00
Sylvain Pelissier 19a8872010 Testing 2016-01-13 10:52:27 +01:00
Sylvain Pelissier 7bc62cd896 PDF extraction error handling 2016-01-13 09:16:56 +01:00
Sylvain Pelissier c83cbd87e7 Merge pull request #1 from maphew/master
Image extractor script with sample failing pdf
2016-01-07 08:29:57 +01:00
Matt Wilkie eeb2b659aa Fails with "ValueError: not enough image data"
```
> python pdf-image-extractor.py ..\PDF_Samples\GeoBase_NHNC1_Data_Model_UML_EN.pdf
Traceback (most recent call last):
  File "pdf-image-extractor.py", line 33, in <module>
    img = Image.frombytes(mode, size, data)
  File "C:\Python27\ArcGIS10.3\lib\site-packages\PIL\Image.py", line 2047, in frombytes
    im.frombytes(data, decoder_name, args)
  File "C:\Python27\ArcGIS10.3\lib\site-packages\PIL\Image.py", line 731, in frombytes
    raise ValueError("not enough image data")
ValueError: not enough image data
```

Source:
http://ftp2.cits.rncan.gc.ca/pub/geobase/official/nhn_rhn/doc/

"""
All distributed data are subject to the Open Government Licence – Canada.

Canada grants to the licensee a non-exclusive, fully paid, royalty-free
right and licence to exercise all intellectual property rights in the
data. This includes the right to use, incorporate, sublicense (with
further right of sublicensing), modify, improve, further develop, and
distribute the Data; and to manufacture or distribute derivative
products.

-- http://www.nrcan.gc.ca/earth-sciences/geography/topographic-information/free-data-geogratis/licence/17285
"""
2016-01-06 11:40:14 -08:00
Matt Wilkie ba3da42d68 Extract images from PDF without resampling or altering.
Adapted from work by Sylvain Pelissier (@sylvainpelissier)
http://stackoverflow.com/questions/2693820/extract-images-from-pdf-without-resampling-in-python

Script works but has limited range of image types it is successful with.
Future commits will have sample PDFs and notes about what works/fails.
2016-01-06 11:26:40 -08:00
Igor Mihaljevic b06ac57a6f Python2/3 compatibility on merging pages with eps img into single page 2015-12-14 23:39:46 +01:00
Sylvain Pelissier 39de327cd9 JPEG 2000 filter added 2015-12-10 10:18:47 +01:00
Sylvain Pelissier 7b591a285d JPEG sample 2015-12-05 11:44:08 +01:00
Sylvain Pelissier 098394a3b3 /DCTDecode stream data 2015-12-05 11:17:12 +01:00
Matthew Stamy 0900101f83 Merge pull request #221 from louib/parameterized_js
Parameterized JavaScript.
2015-09-23 14:31:53 -05:00
Louis-Bertrand Varin ab9395cc5b Adding unit tests for addJS. 2015-08-24 21:50:53 -04:00
Henri Salo 48193975e5 Prevent infinite loop in readObject() function. Patch by dhudson1. Closes mstamy2/PyPDF2#184 2015-08-18 13:42:22 +03:00
Louis-Bertrand Varin 5052688261 Parameterized JavaScript.
According to the PDF doc, an entry in the document's name dictionary,
listing the JavaScript actions, is required to execute parameterized
function call.

Also note that:
> The names are arbitrary and need not bear any relation to the
> JavaScript name
> space.

In this case, the name is _0000000000_.
2015-08-15 00:20:35 -04:00
Matthew Stamy 7456f0acea Stronger equality test for resource values Fixes #182 2015-07-23 16:25:37 -05:00
Matthew Stamy cf269ddfa9 update changelog for patch 2015-07-20 15:11:09 -05:00
Matthew Stamy d0e08b90f5 Conform to semantic versioning. Patch number added 2015-07-20 14:23:51 -05:00
Rob Oakes c9c95a512a Merge branch 'master' of https://github.com/mstamy2/PyPDF2 2015-07-16 17:05:43 -06:00
Matthew Stamy fc05b046c0 Smarter inline image parsing 2015-07-15 11:50:44 -05:00
Matthew Stamy 736dc27453 Replace usage of Str with isString 2015-07-09 14:26:39 -05:00
Matthew Stamy e87538baf1 Version 1.25 2015-07-07 16:05:22 -05:00
Matthew Stamy 80551fa094 Merge pull request #172 from jerickbixly/master
Python3 support for ASCII85Decode
2015-07-06 14:09:36 -05:00
Matthew Stamy 9022c7db14 Merge pull request #211 from speedplane/master
Fix "Stream has ended unexpectedly" for Name Objects
2015-06-30 15:40:29 -05:00
speedplane bf7339863e Also, fix up this regex, which appeared to be totally broken for all but the simplest cases. 2015-06-30 09:06:31 -04:00
speedplane 431ba70920 Fix a bug which could result in a "Stream has ended unexpectedly" error being raised unecessarily if a Name object runs right up against the end of a file stream. 2015-06-30 08:36:55 -04:00
Matthew Stamy ee0ace64b1 Merge pull request #210 from underdogio/dev/copy.encryption.sqwished
Added decryption key copying for PdfFileMerger
2015-06-26 14:42:46 -05:00
Todd Wolfson 541963c54b Added decryption key copying for PdfFileMerger 2015-06-26 14:09:12 -05:00
Matthew Stamy 7ea13fcbea Merge branch 'master' of https://github.com/mstamy2/PyPDF2 2015-06-18 12:50:29 -05:00
Matthew Stamy 8a144a3e2f Provide exception instead of assert false 2015-06-18 12:49:22 -05:00
Matthew Stamy 0b7f9a7d66 Merge pull request #209 from AlmightyOatmeal/patch-1
Add abbreviated short-names for filters
2015-06-18 11:59:16 -05:00
Jamie Ivanov afccc8fc94 Add abbreviated short-names for filters
After investigating an odd error:

# NotImplementedError: unsupported filter /Fl

I saw the PDF had this line:

# <</Filter/Fl/First 12/Length 828/N 2/Type/ObjStm>>stream

But most objects had:

# <</Size 306/Filter/FlateDecode/Length 947/Type/XRef/W[1 3 1]

After looking at the filters.py file, I saw the short-names were not added. After modifying the filters.py, the code is up and running again.

A thanks to Matthew Weiss for helping with this as well.
2015-06-18 10:00:23 -05:00
Matthew Stamy 969d6ef94c Merge pull request #122 from mozbugbox/get-page-number
Add method to get page number from Page/Outline objects
2015-06-17 14:42:24 -05:00
Matthew Stamy 6f1c5284df Read extra initial whitespace when reading object from stream resolves #204 2015-06-17 14:15:37 -05:00
Matthew Stamy 33d7f71ac4 Merge pull request #208 from peircej/master
Separate extracted text fields with EOLs
2015-06-17 13:16:03 -05:00
Jon Peirce 8271888434 Separate extracted text fields with EOLs
Each "TJ" entry is a separate piece of text so provide some way for the user to separate them in the extracted text. This way the user can then txt.split("\n") to get a list of all text blocks
2015-06-17 17:51:17 +01:00
Matthew Stamy 1cdcf7ebee Merge branch 'GuruLabs-roakes/guru_enhancements' 2015-06-16 15:55:21 -05:00
Matthew Stamy ac67ab6251 resolved merge conflict 2015-06-16 15:54:26 -05:00
Matthew Stamy 894b8d1916 Merge branch 'bamrhein-utils_fixes' 2015-06-15 15:21:16 -05:00
Matthew Stamy eb93deb3cd sys.maxint does not exist in Py 3 2015-06-15 15:19:32 -05:00
Matthew Stamy c2af8a0c6c Utilize isString 2015-06-15 15:01:21 -05:00