Improve reader interface (#90)

* 🔨 improve reader interface

* 🔨 shrink reader code

* This is an auto-commit, updating project meta data, such as changelog.rst, contributors.rst

* 🔥 remove redundant functionalitoes, never will use. what's the point

* 📚 updated doc string and the tutorial

* 🔨 update import statements

* 🔬 more test coverage

* This is an auto-commit, updating project meta data, such as changelog.rst, contributors.rst

* 💚 fix unit test failure

* 📚 update reader plugin example

* 💄 update coding style

* 📚 fix index rst file

* This is an auto-commit, updating project meta data, such as changelog.rst, contributors.rst

Co-authored-by: chfw <chfw@users.noreply.github.com>
This commit is contained in:
jaska 2020-10-04 22:15:23 +01:00 committed by GitHub
parent 29c26680a7
commit fa808870d3
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
33 changed files with 337 additions and 76 deletions

View File

@ -6,5 +6,5 @@ targets:
- setup.py: io_setup.py.jj2
- .travis.yml: custom_travis.yml.jj2
- README.rst: io_readme.rst.jj2
- "docs/source/index.rst": "docs/source/index.rst"
- "docs/source/index.rst": "docs/source/index.rst.jj2"
- .gitignore: gitignore.jj2

View File

@ -1,4 +1,5 @@
5 contributors
================================================================================

View File

@ -1,3 +1,31 @@
Extend pyexcel-io Tutorial
================================================================================
pyexcel-io itself comes with csv support.
Reader
--------------------------------------------------------------------------------
Suppose we have a yaml file, containing a dictionary where the values are
two dimensional array. The task is write reader plugin to pyexcel-io so that
we can use get_data() to read it out.
Example yaml data::
.. literalinclude:: ../../examples/test.yaml
:language: yaml
Example code::
.. literalinclude:: ../../examples/custom_yeaml_reader.py
:language: python
Writer
--------------------------------------------------------------------------------
Working with xls, xlsx, and ods formats
================================================================================

View File

@ -3,7 +3,16 @@
You can adapt this file completely to your liking, but it should at least
contain the root `toctree` directive.
{%include "header.rst.jj2" %}
`pyexcel-io` - Let you focus on data, instead of file formats
================================================================================
:Author: chfw
:Source code: http://github.com/pyexcel/pyexcel-io.git
:Issues: http://github.com/pyexcel/pyexcel-io/issues
:License: New BSD License
:Development: |release|
:Released: |version|
:Generated: |today|
Introduction
--------------------------------------------------------------------------------
@ -33,11 +42,104 @@ as of 2014. They are invented and supported by `pyexcel-io`_.
Installation
--------------------------------------------------------------------------------
{%include "installation.rst.jj2" %}
You can install pyexcel-io via pip:
.. code-block:: bash
$ pip install pyexcel-io
or clone it and install it:
.. code-block:: bash
$ git clone https://github.com/pyexcel/pyexcel-io.git
$ cd pyexcel-io
$ python setup.py install
For individual excel file formats, please install them as you wish:
{%include "io-plugins-list.rst.jj2" %}
.. _file-format-list:
.. _a-map-of-plugins-and-file-formats:
.. table:: A list of file formats supported by external plugins
======================== ======================= ================= ==================
Package name Supported file formats Dependencies Python versions
======================== ======================= ================= ==================
`pyexcel-io`_ >=v0.6.0 csv, csvz [#f1]_, tsv, 3.6+
tsvz [#f2]_
`pyexcel-io`_ <=0.5.20 same as above 2.6, 2.7, 3.3,
3.4, 3.5, 3.6
pypy
`pyexcel-xls`_ xls, xlsx(read only), `xlrd`_, same as above
xlsm(read only) `xlwt`_
`pyexcel-xlsx`_ xlsx `openpyxl`_ same as above
`pyexcel-ods3`_ ods `pyexcel-ezodf`_, 2.6, 2.7, 3.3, 3.4
lxml 3.5, 3.6
`pyexcel-ods`_ ods `odfpy`_ same as above
======================== ======================= ================= ==================
.. table:: Dedicated file reader and writers
======================== ======================= ================= ==================
Package name Supported file formats Dependencies Python versions
======================== ======================= ================= ==================
`pyexcel-xlsxw`_ xlsx(write only) `XlsxWriter`_ Python 2 and 3
`pyexcel-xlsxr`_ xlsx(read only) lxml same as above
`pyexcel-xlsbr`_ xlsx(read only) pyxlsb same as above
`pyexcel-odsr`_ read only for ods, fods lxml same as above
`pyexcel-odsw`_ write only for ods loxun same as above
`pyexcel-htmlr`_ html(read only) lxml,html5lib same as above
`pyexcel-pdfr`_ pdf(read only) pdftables Python 2 only.
======================== ======================= ================= ==================
Plugin shopping guide
------------------------
Except csv files, xls, xlsx and ods files are a zip of a folder containing a lot of
xml files
The dedicated readers for excel files can stream read
In order to manage the list of plugins installed, you need to use pip to add or remove
a plugin. When you use virtualenv, you can have different plugins per virtual
environment. In the situation where you have multiple plugins that does the same thing
in your environment, you need to tell pyexcel which plugin to use per function call.
For example, pyexcel-ods and pyexcel-odsr, and you want to get_array to use pyexcel-odsr.
You need to append get_array(..., library='pyexcel-odsr').
.. _pyexcel-io: https://github.com/pyexcel/pyexcel-io
.. _pyexcel-xls: https://github.com/pyexcel/pyexcel-xls
.. _pyexcel-xlsx: https://github.com/pyexcel/pyexcel-xlsx
.. _pyexcel-ods: https://github.com/pyexcel/pyexcel-ods
.. _pyexcel-ods3: https://github.com/pyexcel/pyexcel-ods3
.. _pyexcel-odsr: https://github.com/pyexcel/pyexcel-odsr
.. _pyexcel-odsw: https://github.com/pyexcel/pyexcel-odsw
.. _pyexcel-pdfr: https://github.com/pyexcel/pyexcel-pdfr
.. _pyexcel-xlsxw: https://github.com/pyexcel/pyexcel-xlsxw
.. _pyexcel-xlsxr: https://github.com/pyexcel/pyexcel-xlsxr
.. _pyexcel-xlsbr: https://github.com/pyexcel/pyexcel-xlsbr
.. _pyexcel-htmlr: https://github.com/pyexcel/pyexcel-htmlr
.. _xlrd: https://github.com/python-excel/xlrd
.. _xlwt: https://github.com/python-excel/xlwt
.. _openpyxl: https://bitbucket.org/openpyxl/openpyxl
.. _XlsxWriter: https://github.com/jmcnamara/XlsxWriter
.. _pyexcel-ezodf: https://github.com/pyexcel/pyexcel-ezodf
.. _odfpy: https://github.com/eea/odfpy
.. rubric:: Footnotes
.. [#f1] zipped csv file
.. [#f2] zipped tsv file
After that, you can start get and save data in the loaded format. There
are two plugins for the same file format, e.g. pyexcel-ods3 and pyexcel-ods.
@ -91,7 +193,6 @@ get_data(.., library='pyexcel-ods')
csvz
sqlalchemy
django
options
extensions

View File

@ -0,0 +1,45 @@
import yaml
from pyexcel_io import get_data
from pyexcel_io.sheet import NamedContent
from pyexcel_io.plugins import IOPluginInfoChainV2
from pyexcel_io.plugin_api import ISheet, IReader
class YourSingleSheet(ISheet):
def __init__(self, your_native_sheet):
self.two_dimensional_array = your_native_sheet
def row_iterator(self):
yield from self.two_dimensional_array
def column_iterator(self, row):
yield from row
class YourReader(IReader):
def __init__(self, file_name, file_type, **keywords):
self.file_handle = open(file_name, "r")
self.native_book = yaml.load(self.file_handle)
self.content_array = [
NamedContent(key, values)
for key, values in self.native_book.items()
]
def read_sheet(self, sheet_index):
two_dimensional_array = self.content_array[sheet_index].payload
return YourSingleSheet(two_dimensional_array)
def close(self):
self.file_handle.close()
IOPluginInfoChainV2(__name__).add_a_reader(
relative_plugin_class_path="YourReader",
locations=["file"],
file_types=["yaml"],
stream_type="text",
)
if __name__ == "__main__":
data = get_data("test.yaml")
print(data)

11
examples/test.yaml Normal file
View File

@ -0,0 +1,11 @@
sheet 1:
- - 1
- 2
- 3
- - 2
- 3
- 4
sheet 2:
- - A
- B
- C

View File

@ -13,7 +13,7 @@ dependencies:
- lml>=0.0.4
test_dependencies:
- pyexcel
- pyexcel-xls
- pyexcel-xls==0.5.9
- SQLAlchemy
- pyexcel-xlsxw
extra_dependencies:

View File

@ -51,8 +51,4 @@ def is_string(atype):
if atype == str:
return True
elif PY2:
if atype == unicode:
return True
return False

View File

@ -7,8 +7,8 @@
:copyright: (c) 2014-2020 by Onni Software Ltd.
:license: New BSD License, see LICENSE for more details
"""
from pyexcel_io.plugin_api import IReader
from pyexcel_io.database.querysets import QuerysetsReader
from pyexcel_io.plugin_api.abstract_reader import IReader
class DjangoModelReader(QuerysetsReader):

View File

@ -1,5 +1,5 @@
from pyexcel_io.plugin_api import IReader
from pyexcel_io.database.querysets import QuerysetsReader
from pyexcel_io.plugin_api.abstract_reader import IReader
class QueryReader(IReader):

View File

@ -7,8 +7,8 @@
:copyright: (c) 2014-2020 by Onni Software Ltd.
:license: New BSD License, see LICENSE for more details
"""
from pyexcel_io.plugin_api import IReader
from pyexcel_io.database.querysets import QuerysetsReader
from pyexcel_io.plugin_api.abstract_reader import IReader
class SQLTableReader(QuerysetsReader):

View File

@ -11,8 +11,7 @@ import logging
import pyexcel_io.constants as constants
from pyexcel_io.utils import is_empty_array, swap_empty_string_for_none
from pyexcel_io.plugin_api.abstract_sheet import ISheetWriter
from pyexcel_io.plugin_api.abstract_writer import IWriter
from pyexcel_io.plugin_api import IWriter, ISheetWriter
log = logging.getLogger(__name__)

View File

@ -9,8 +9,7 @@
"""
import pyexcel_io.constants as constants
from pyexcel_io.utils import is_empty_array, swap_empty_string_for_none
from pyexcel_io.plugin_api.abstract_sheet import ISheetWriter
from pyexcel_io.plugin_api.abstract_writer import IWriter
from pyexcel_io.plugin_api import IWriter, ISheetWriter
class PyexcelSQLSkipRowException(Exception):

View File

@ -32,8 +32,11 @@ class QuerysetsReader(ISheet):
if len(self.__query_sets) == 0:
yield []
for element in ISheet.to_array(self):
yield element
for row in self.row_iterator():
row_values = []
for value in self.column_iterator(row):
row_values.append(value)
yield row_values
def column_iterator(self, row):
if self.__column_names is None:

View File

@ -0,0 +1,3 @@
from .abstract_sheet import ISheet, ISheetWriter # noqa: F401
from .abstract_reader import IReader # noqa: F401
from .abstract_writer import IWriter # noqa: F401

View File

@ -1,9 +1,18 @@
from pyexcel_io._compact import OrderedDict
from .abstract_sheet import ISheet
class IReader(object):
def read_all(self):
result = OrderedDict()
for index, sheet in enumerate(self.content_array):
result.update({sheet.name: self.read_sheet(index).to_array()})
return result
"""
content_array should be a list of NamedContent
where: name is the sheet name,
payload is the native sheet.
"""
def read_sheet(self, sheet_index) -> ISheet:
raise NotImplementedError("")
def sheet_names(self):
return [content.name for content in self.content_array]
def __len__(self):
return len(self.content_array)

View File

@ -1,15 +1,15 @@
class ISheet(object):
def to_array(self):
data = []
for row in self.row_iterator():
my_row = []
for element in self.column_iterator(row):
my_row.append(element)
data.append(my_row)
return data
def row_iterator(self):
raise NotImplementedError("")
def column_iterator(self, row):
raise NotImplementedError("")
class ISheetWriter(object):
def write_row(self, data_row):
raise NotImplementedError("How does your sheet write a row of data")
def write_array(self, table):
"""
For standalone usage, write an array

View File

@ -1,4 +1,10 @@
from .abstract_sheet import ISheetWriter
class IWriter(object):
def create_sheet(self, sheet_name) -> ISheetWriter:
raise NotImplementedError("Please implement a native sheet writer")
def write(self, incoming_dict):
for sheet_name in incoming_dict:
sheet_writer = self.create_sheet(sheet_name)

View File

@ -76,38 +76,25 @@ class Reader(object):
"""
read a named sheet from a excel data book
"""
for index, content in enumerate(self.reader.content_array):
if content.name == sheet_name:
return {content.name: self.read_sheet(index)}
else:
raise ValueError("Cannot find sheet %s" % sheet_name)
sheet_names = self.reader.sheet_names()
index = sheet_names.index(sheet_name)
def read_sheet(self, sheet_index):
sheet_reader = self.reader.read_sheet(sheet_index)
sheet = EncapsulatedSheetReader(sheet_reader, **self.keywords)
return sheet.to_array()
return self.read_sheet_by_index(index)
def read_sheet_by_index(self, sheet_index):
"""
read an indexed sheet from a excel data book
"""
try:
name = self.reader.content_array[sheet_index].name
return {name: self.read_sheet(sheet_index)}
except IndexError:
self.close()
raise
sheet_reader = self.reader.read_sheet(sheet_index)
sheet_names = self.reader.sheet_names()
sheet = EncapsulatedSheetReader(sheet_reader, **self.keywords)
return {sheet_names[sheet_index]: sheet.to_array()}
def read_all(self):
"""
read everything from a excel data book
"""
result = OrderedDict()
for index, sheet in enumerate(self.reader.content_array):
result.update(
{self.reader.content_array[index].name: self.read_sheet(index)}
)
for sheet_index in range(len(self.reader)):
content_dict = self.read_sheet_by_index(sheet_index)
result.update(content_dict)
return result
def read_many(self, sheets):

View File

@ -4,8 +4,8 @@ import glob
from pyexcel_io import constants
from pyexcel_io.sheet import NamedContent
from pyexcel_io.plugin_api import IReader
from pyexcel_io.readers.csv_sheet import CSVFileReader
from pyexcel_io.plugin_api.abstract_reader import IReader
DEFAULT_NEWLINE = "\r\n"

View File

@ -3,8 +3,8 @@ import re
import pyexcel_io._compact as compact
from pyexcel_io import constants
from pyexcel_io.sheet import NamedContent
from pyexcel_io.plugin_api import IReader
from pyexcel_io.readers.csv_sheet import CSVinMemoryReader
from pyexcel_io.plugin_api.abstract_reader import IReader
DEFAULT_SHEET_SEPARATOR_FORMATTER = f"---{constants.DEFAULT_NAME}---%s"

View File

@ -1,5 +1,5 @@
"""
pyexcel_io.readers.csvr
pyexcel_io.readers.csv_sheet
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
csv file reader
@ -12,7 +12,7 @@ import csv
import pyexcel_io.service as service
import pyexcel_io._compact as compact
import pyexcel_io.constants as constants
from pyexcel_io.plugin_api.abstract_sheet import ISheet
from pyexcel_io.plugin_api import ISheet
DEFAULT_SEPARATOR = "__"
DEFAULT_SHEET_SEPARATOR_FORMATTER = "---%s---" % constants.DEFAULT_NAME + "%s"

View File

@ -1,6 +1,6 @@
from pyexcel_io import constants
from pyexcel_io.plugin_api import IWriter
from pyexcel_io.writers.csv_sheet import CSVFileWriter
from pyexcel_io.plugin_api.abstract_writer import IWriter
class CsvFileWriter(IWriter):

View File

@ -1,6 +1,6 @@
from pyexcel_io import constants
from pyexcel_io.plugin_api import IWriter
from pyexcel_io.writers.csv_sheet import CSVMemoryWriter
from pyexcel_io.plugin_api.abstract_writer import IWriter
class CsvMemoryWriter(IWriter):

View File

@ -1,5 +1,5 @@
"""
pyexcel_io.writers.csvw
pyexcel_io.writers.csv_sheet
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The lower level csv file format writer
@ -10,7 +10,7 @@
import csv
import pyexcel_io.constants as constants
from pyexcel_io.plugin_api.abstract_sheet import ISheetWriter
from pyexcel_io.plugin_api import ISheetWriter
class CSVFileWriter(ISheetWriter):

View File

@ -1,6 +1,6 @@
"""
pyexcel_io.fileformat.csvz
~~~~~~~~~~~~~~~~~~~~~~~~~~~
pyexcel_io.fileformat.csvz_sheet
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The lower level csvz file format handler.

View File

@ -1,8 +1,8 @@
import zipfile
from pyexcel_io import constants
from pyexcel_io.plugin_api import IWriter
from pyexcel_io.writers.csvz_sheet import CSVZipSheetWriter
from pyexcel_io.plugin_api.abstract_writer import IWriter
class CsvZipWriter(IWriter):

View File

@ -10,6 +10,6 @@ pygments
moban
moban_jinja2_github
pyexcel
pyexcel-xls
pyexcel-xls==0.5.9
SQLAlchemy
pyexcel-xlsxw

View File

@ -33,7 +33,9 @@ class TestReaders(TestCase):
sheet.get_file_handle()
def test_sheet_file_reader(self):
r = CSVFileReader(NamedContent(self.file_type, self.test_file))
r = EncapsulatedSheetReader(
CSVFileReader(NamedContent(self.file_type, self.test_file))
)
result = list(r.to_array())
self.assertEqual(result, self.expected_data)
@ -42,7 +44,9 @@ class TestReaders(TestCase):
with open(self.test_file, "r") as f:
io.write(f.read())
io.seek(0)
r = CSVinMemoryReader(NamedContent(self.file_type, io))
r = EncapsulatedSheetReader(
CSVinMemoryReader(NamedContent(self.file_type, io))
)
result = list(r.to_array())
self.assertEqual(result, self.expected_data)
@ -136,7 +140,9 @@ class TestNonUniformCSV(TestCase):
def test_utf16_decoding():
test_file = os.path.join("tests", "fixtures", "csv-encoding-utf16.csv")
reader = CSVFileReader(NamedContent("csv", test_file), encoding="utf-16")
reader = EncapsulatedSheetReader(
CSVFileReader(NamedContent("csv", test_file), encoding="utf-16")
)
content = list(reader.to_array())
expected = [["Äkkilähdöt", "Matkakirjoituksia", "Matkatoimistot"]]
@ -160,8 +166,8 @@ def test_utf16_encoding():
def test_utf16_memory_decoding():
test_content = u"Äkkilähdöt,Matkakirjoituksia,Matkatoimistot"
test_content = BytesIO(test_content.encode("utf-16"))
reader = CSVinMemoryReader(
NamedContent("csv", test_content), encoding="utf-16"
reader = EncapsulatedSheetReader(
CSVinMemoryReader(NamedContent("csv", test_content), encoding="utf-16")
)
content = list(reader.to_array())

View File

@ -339,7 +339,7 @@ class TestMultipleModels:
exporter.append(adapter1)
exporter.append(adapter2)
reader = DjangoBookReader(exporter, "django")
result = reader.read_all()
result = read_all(reader)
for key in result:
result[key] = list(result[key])
eq_(result, self.content)
@ -411,3 +411,10 @@ def test_django_model_import_adapter():
adapter.column_names = ["a"]
adapter.row_initializer = "abc"
eq_(adapter.row_initializer, "abc")
def read_all(reader):
result = OrderedDict()
for index, sheet in enumerate(reader.content_array):
result.update({sheet.name: reader.read_sheet(index).to_array()})
return result

53
tests/test_plugin_api.py Normal file
View File

@ -0,0 +1,53 @@
from pyexcel_io.plugin_api import ISheet, IReader, IWriter, ISheetWriter
from nose.tools import raises
class TestISheet:
def setUp(self):
self.isheet = ISheet()
@raises(NotImplementedError)
def test_row_iterator(self):
self.isheet.row_iterator()
@raises(NotImplementedError)
def test_column_iterator(self):
self.isheet.column_iterator(1)
class TestISheetWriter:
def setUp(self):
self.isheet_writer = ISheetWriter()
@raises(NotImplementedError)
def test_write_row(self):
self.isheet_writer.write_row([1, 2])
class TestIReader:
def setUp(self):
self.ireader = IReader()
@raises(NotImplementedError)
def test_read_sheet(self):
self.ireader.read_sheet(1)
class TestIWriter:
def setUp(self):
self.iwriter = IWriter()
@raises(NotImplementedError)
def test_create_sheet(self):
self.iwriter.create_sheet("a name")
@raises(Exception)
def test_empty_writer():
class TestWriter(IWriter):
def create_sheet(self, sheet_name):
return None
test_writer = TestWriter()
test_writer.write({"sheet 1": [[1, 2]]})

View File

@ -230,7 +230,7 @@ class TestSingleWrite:
query_sets = mysession.query(Pyexcel).all()
query_reader = QueryReader(query_sets, None, column_names=self.data[0])
result = query_reader.read_all()
result = read_all(query_reader)
for key in result:
result[key] = list(result[key])
eq_(result, {"pyexcel_sheet1": self.results})
@ -471,7 +471,7 @@ class TestMultipleRead:
post_adapter = SQLTableExportAdapter(Post)
exporter.append(post_adapter)
reader = SQLBookReader(exporter, "sql")
result = reader.read_all()
result = read_all(reader)
for key in result:
result[key] = list(result[key])
@ -574,3 +574,10 @@ def test_unknown_sheet():
to_store = OrderedDict()
to_store.update({"you do not see me": [[]]})
writer.write(to_store)
def read_all(reader):
result = OrderedDict()
for index, sheet in enumerate(reader.content_array):
result.update({sheet.name: reader.read_sheet(index).to_array()})
return result