🤝 merge with dev

This commit is contained in:
chfw 2018-08-22 18:52:46 +01:00
commit 42c4fac89b
14 changed files with 268 additions and 14 deletions

View File

@ -5,6 +5,11 @@
{%block description%}
**pyexcel-{{file_type}}** is a tiny wrapper library to read, manipulate and write data in {{file_type}} format and it can read xlsx and xlsm fromat. You are likely to use it with `pyexcel <https://github.com/pyexcel/pyexcel>`_.
New flag: `detect_merged_cells` allows you to spread the same value among all merged cells. But be aware that this may slow down its reading performance.
New flag: `skip_hidden_row_and_column` allows you to skip hidden rows and columns and is defaulted to **True**. It may slow down its reading performance. And it is only valid for 'xls' files. For 'xlsx' files, please use pyexcel-xlsx.
{%endblock%}
{%block extras %}

View File

@ -1,7 +1,44 @@
Change log
================================================================================
0.6.0 - unreleased
0.5.7 - 15.03.2018
--------------------------------------------------------------------------------
Added
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
#. `pyexcel#54 <https://github.com/pyexcel/pyexcel/issues/54>`_, Book.datemode
attribute of that workbook should be passed always.
0.5.6 - 15.03.2018
--------------------------------------------------------------------------------
Added
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
#. `pyexcel#120 <https://github.com/pyexcel/pyexcel/issues/120>`_, xlwt cannot
save a book without any sheet. So, let's raise an exception in this case in
order to warn the developers.
0.5.5 - 8.11.2017
--------------------------------------------------------------------------------
Added
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
#. `#25 <https://github.com/pyexcel/pyexcel-xls/issues/25>`_, detect merged cell
in .xls
0.5.4 - 2.11.2017
--------------------------------------------------------------------------------
Added
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
#. `#24 <https://github.com/pyexcel/pyexcel-xls/issues/24>`_, xlsx format cannot
use skip_hidden_row_and_column. please use pyexcel-xlsx instead.
0.5.3 - 2.11.2017
--------------------------------------------------------------------------------
Added
@ -10,6 +47,27 @@ Added
#. `#21 <https://github.com/pyexcel/pyexcel-xls/issues/21>`_, skip hidden rows
and columns under 'skip_hidden_row_and_column' flag.
0.5.2 - 23.10.2017
--------------------------------------------------------------------------------
updated
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
#. pyexcel `pyexcel#105 <https://github.com/pyexcel/pyexcel/issues/105>`_,
remove gease from setup_requires, introduced by 0.5.1.
#. remove python2.6 test support
#. update its dependecy on pyexcel-io to 0.5.3
0.5.1 - 20.10.2017
--------------------------------------------------------------------------------
added
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
#. `pyexcel#103 <https://github.com/pyexcel/pyexcel/issues/103>`_, include
LICENSE file in MANIFEST.in, meaning LICENSE file will appear in the released
tar ball.
0.5.0 - 30.08.2017
--------------------------------------------------------------------------------

View File

@ -20,6 +20,11 @@ pyexcel-xls - Let you focus on data, instead of xls format
**pyexcel-xls** is a tiny wrapper library to read, manipulate and write data in xls format and it can read xlsx and xlsm fromat. You are likely to use it with `pyexcel <https://github.com/pyexcel/pyexcel>`_.
New flag: `detect_merged_cells` allows you to spread the same value among all merged cells. But be aware that this may slow down its reading performance.
New flag: `skip_hidden_row_and_column` allows you to skip hidden rows and columns and is defaulted to **True**. It may slow down its reading performance. And it is only valid for 'xls' files. For 'xlsx' files, please use pyexcel-xlsx.
Known constraints
==================

View File

@ -1,12 +1,52 @@
name: pyexcel-xls
organisation: pyexcel
releases:
- changes:
- action: Added
details:
- "`pyexcel#54`, Book.datemode attribute of that workbook should be passed always."
date: 15.03.2018
version: 0.5.7
- changes:
- action: Added
details:
- "`pyexcel#120`, xlwt cannot save a book without any sheet. So, let's raise an exception in this case in order to warn the developers."
date: 15.03.2018
version: 0.5.6
- changes:
- action: Added
details:
- '`#25`, detect merged cell in .xls'
date: 8.11.2017
version: 0.5.5
- changes:
- action: Added
details:
- '`#24`, xlsx format cannot use skip_hidden_row_and_column. please use pyexcel-xlsx
instead.'
date: 2.11.2017
version: 0.5.4
- changes:
- action: Added
details:
- '`#21`, skip hidden rows and columns under ''skip_hidden_row_and_column'' flag.'
date: unreleased
version: 0.6.0
date: 2.11.2017
version: 0.5.3
- changes:
- action: updated
details:
- pyexcel `pyexcel#105`, remove gease from setup_requires, introduced by 0.5.1.
- remove python2.6 test support
- update its dependecy on pyexcel-io to 0.5.3
date: 23.10.2017
version: 0.5.2
- changes:
- action: added
details:
- '`pyexcel#103`, include LICENSE file in MANIFEST.in, meaning LICENSE file will
appear in the released tar ball.'
date: 20.10.2017
version: 0.5.1
- changes:
- action: Updated
details:

View File

@ -6,7 +6,7 @@ current_version: 0.5.8
release: 0.5.7
file_type: xls
dependencies:
- pyexcel-io>=0.5.0
- pyexcel-io>=0.5.3
- xlrd
- xlwt
description: A wrapper library to read, manipulate and write data in xls format. It reads xlsx and xlsm format

View File

@ -12,7 +12,7 @@ import xlrd
from pyexcel_io.book import BookReader
from pyexcel_io.sheet import SheetReader
from pyexcel_io._compact import OrderedDict
from pyexcel_io._compact import OrderedDict, irange
from pyexcel_io.service import has_no_digits_in_float
@ -24,17 +24,38 @@ XLS_KEYWORDS = [
DEFAULT_ERROR_VALUE = '#N/A'
class MergedCell(object):
def __init__(self, row_low, row_high, column_low, column_high):
self.__rl = row_low
self.__rh = row_high
self.__cl = column_low
self.__ch = column_high
self.value = None
def register_cells(self, registry):
for rowx in irange(self.__rl, self.__rh):
for colx in irange(self.__cl, self.__ch):
key = "%s-%s" % (rowx, colx)
registry[key] = self
class XLSheet(SheetReader):
"""
xls, xlsx, xlsm sheet reader
Currently only support first sheet in the file
"""
def __init__(self, sheet, auto_detect_int=True, **keywords):
def __init__(self, sheet, auto_detect_int=True, date_mode=0, **keywords):
SheetReader.__init__(self, sheet, **keywords)
self.__auto_detect_int = auto_detect_int
self.__hidden_cols = []
self.__hidden_rows = []
self.__merged_cells = {}
self._book_date_mode = date_mode
if keywords.get('detect_merged_cells') is True:
for merged_cell_ranges in sheet.merged_cells:
merged_cells = MergedCell(*merged_cell_ranges)
merged_cells.register_cells(self.__merged_cells)
if keywords.get('skip_hidden_row_and_column') is True:
for col_index, info in self._native_sheet.colinfo_map.items():
if info.hidden == 1:
@ -63,16 +84,26 @@ class XLSheet(SheetReader):
"""
Random access to the xls cells
"""
row, column = self._offset_hidden_indices(row, column)
if self._keywords.get('skip_hidden_row_and_column') is True:
row, column = self._offset_hidden_indices(row, column)
cell_type = self._native_sheet.cell_type(row, column)
value = self._native_sheet.cell_value(row, column)
if cell_type == xlrd.XL_CELL_DATE:
value = xldate_to_python_date(value)
value = xldate_to_python_date(value, self._book_date_mode)
elif cell_type == xlrd.XL_CELL_NUMBER and self.__auto_detect_int:
if has_no_digits_in_float(value):
value = int(value)
elif cell_type == xlrd.XL_CELL_ERROR:
value = DEFAULT_ERROR_VALUE
if self.__merged_cells:
merged_cell = self.__merged_cells.get("%s-%s" % (row, column))
if merged_cell:
if merged_cell.value:
value = merged_cell.value
else:
merged_cell.value = value
return value
def _offset_hidden_indices(self, row, column):
@ -100,6 +131,7 @@ class XLSBook(BookReader):
self._file_content = None
self.__skip_hidden_sheets = True
self.__skip_hidden_row_column = True
self.__detect_merged_cells = False
def open(self, file_name, **keywords):
self.__parse_keywords(**keywords)
@ -118,6 +150,7 @@ class XLSBook(BookReader):
self.__skip_hidden_sheets = keywords.get('skip_hidden_sheets', True)
self.__skip_hidden_row_column = keywords.get(
'skip_hidden_row_and_column', True)
self.__detect_merged_cells = keywords.get('detect_merged_cells', False)
def close(self):
if self._native_book:
@ -148,7 +181,8 @@ class XLSBook(BookReader):
return result
def read_sheet(self, native_sheet):
sheet = XLSheet(native_sheet, **self._keywords)
sheet = XLSheet(native_sheet, date_mode=self._native_book.datemode,
**self._keywords)
return {sheet.name: sheet.to_array()}
def _get_book(self, on_demand=False):
@ -164,7 +198,9 @@ class XLSBook(BookReader):
xlrd_params['file_contents'] = self._file_content
else:
raise IOError("No valid file name or file content found.")
if self.__skip_hidden_row_column:
if self.__skip_hidden_row_column and self._file_type == 'xls':
xlrd_params['formatting_info'] = True
if self.__detect_merged_cells:
xlrd_params['formatting_info'] = True
xls_book = xlrd.open_workbook(**xlrd_params)
return xls_book
@ -178,11 +214,12 @@ class XLSBook(BookReader):
return params
def xldate_to_python_date(value):
def xldate_to_python_date(value, date_mode):
"""
convert xl date to python date
"""
date_tuple = xlrd.xldate_as_tuple(value, 0)
date_tuple = xlrd.xldate_as_tuple(value, date_mode)
ret = None
if date_tuple == (0, 0, 0, 0, 0, 0):
ret = datetime.datetime(1900, 1, 1, 0, 0, 0)

View File

@ -18,6 +18,7 @@ from pyexcel_io.sheet import SheetWriter
DEFAULT_DATE_FORMAT = "DD/MM/YY"
DEFAULT_TIME_FORMAT = "HH:MM:SS"
DEFAULT_DATETIME_FORMAT = "%s %s" % (DEFAULT_DATE_FORMAT, DEFAULT_TIME_FORMAT)
EMPTY_SHEET_NOT_ALLOWED = "xlwt does not support a book without any sheets"
class XLSheetWriter(SheetWriter):
@ -76,6 +77,12 @@ class XLSWriter(BookWriter):
self.work_book = Workbook(style_compression=style_compression,
encoding=encoding)
def write(self, incoming_dict):
if incoming_dict:
BookWriter.write(self, incoming_dict)
else:
raise NotImplementedError(EMPTY_SHEET_NOT_ALLOWED)
def create_sheet(self, name):
return XLSheetWriter(self.work_book, None, name)

View File

@ -1,3 +1,3 @@
pyexcel-io>=0.5.0
pyexcel-io>=0.5.3
xlrd
xlwt

View File

@ -42,7 +42,7 @@ CLASSIFIERS = [
]
INSTALL_REQUIRES = [
'pyexcel-io>=0.5.0',
'pyexcel-io>=0.5.3',
'xlrd',
'xlwt',
]

BIN
tests/fixtures/complex-merged-cells-sheet.xls vendored Executable file

Binary file not shown.

BIN
tests/fixtures/merged-cell-sheet.xls vendored Executable file

Binary file not shown.

BIN
tests/fixtures/merged-sheet-exploration.xls vendored Executable file

Binary file not shown.

View File

@ -7,6 +7,8 @@
import os
import pyexcel as pe
from pyexcel_xls import save_data
from pyexcel_xls.xlsr import xldate_to_python_date
from pyexcel_xls.xlsw import XLSWriter as Writer
from _compact import OrderedDict
from nose.tools import eq_, raises
from nose import SkipTest
@ -98,5 +100,20 @@ def test_issue_151():
eq_('#N/A', s[0,0])
@raises(NotImplementedError)
def test_empty_book_pyexcel_issue_120():
"""
https://github.com/pyexcel/pyexcel/issues/120
"""
writer = Writer()
writer.write({})
def test_pyexcel_issue_54():
xlvalue = 41071.0
date = xldate_to_python_date(xlvalue, 1)
eq_(date, datetime.date(2016, 6, 12))
def get_fixture(file_name):
return os.path.join("tests", "fixtures", file_name)

View File

@ -0,0 +1,85 @@
import os
from pyexcel_xls import get_data
from pyexcel_xls.xlsr import MergedCell
from nose.tools import eq_
def test_merged_cells():
data = get_data(
get_fixture("merged-cell-sheet.xls"),
detect_merged_cells=True,
library="pyexcel-xls")
expected = [[1, 2, 3], [1, 5, 6], [1, 8, 9], [10, 11, 11]]
eq_(data['Sheet1'], expected)
def test_complex_merged_cells():
data = get_data(
get_fixture("complex-merged-cells-sheet.xls"),
detect_merged_cells=True,
library="pyexcel-xls")
expected = [
[1, 1, 2, 3, 15, 16, 22, 22, 24, 24],
[1, 1, 4, 5, 15, 17, 22, 22, 24, 24],
[6, 7, 8, 9, 15, 18, 22, 22, 24, 24],
[10, 11, 11, 12, 19, 19, 23, 23, 24, 24],
[13, 11, 11, 14, 20, 20, 23, 23, 24, 24],
[21, 21, 21, 21, 21, 21, 23, 23, 24, 24],
[25, 25, 25, 25, 25, 25, 25, 25, 25, 25],
[25, 25, 25, 25, 25, 25, 25, 25, 25, 25]
]
eq_(data['Sheet1'], expected)
def test_exploration():
data = get_data(
get_fixture("merged-sheet-exploration.xls"),
detect_merged_cells=True,
library="pyexcel-xls")
expected_sheet1 = [
[1, 1, 1, 1, 1, 1],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2]]
eq_(data['Sheet1'], expected_sheet1)
expected_sheet2 = [
[3],
[3],
[3],
[3, 4, 4, 4, 4, 4, 4],
[3],
[3],
[3]]
eq_(data['Sheet2'], expected_sheet2)
expected_sheet3 = [
['', '', '', '', '', 2, 2, 2],
[],
[],
[],
['', '', '', 5],
['', '', '', 5],
['', '', '', 5],
['', '', '', 5],
['', '', '', 5]]
eq_(data['Sheet3'], expected_sheet3)
def test_merged_cell_class():
test_dict = {}
merged_cell = MergedCell(1, 4, 1, 4)
merged_cell.register_cells(test_dict)
keys = sorted(list(test_dict.keys()))
expected = ['1-1', '1-2', '1-3', '2-1',
'2-2', '2-3', '3-1', '3-2', '3-3']
eq_(keys, expected)
eq_(merged_cell, test_dict['3-1'])
def get_fixture(file_name):
return os.path.join("tests", "fixtures", file_name)