Coverage for pandalone\xlsreader.py : 97%

Hot-keys on this page
r m x p toggle line displays
j k next/prev highlighted chunk
0 (zero) top of page
1 (one) first highlighted chunk
#!/usr/bin/env python # -*- coding: UTF-8 -*- # # Copyright 2014 European Commission (JRC); # Licensed under the EUPL (the 'Licence'); # You may not use this work except in compliance with the Licence. # You may obtain a copy of the Licence at: http://ec.europa.eu/idabc/eupl A mini-language to capture rectangular-ranges from Excel-sheets by scanning empty/full cells.
.. seealso:: Example spreadsheet: :download:`xls_ref.xlsx`
Excel-ref =========
Syntax::
<1st-cell>[:[<2nd-cell>][:<expansions>]][<filters>] :
Annotated example::
target-moves───┐ cell-coords──────┐ │ ┌┤┌┴─┐ A1(RD):..(RD):L?DR{"fun": "df", "kws": {"header": false}} └─┬──┘ └─┬──┘ └┬─┘└───────────────┬─────────────────────┘ 1st-cell-pos──────┘ │ │ │ 2nd-cell-pos─────────────┘ │ │ range-expansions───────────────┘ │ filters───────────────────────────────────────────┘
Definitions -----------
.. default-role:: term .. glossary::
excel-url xl-url Any url with its fragment abiding to the `excel-ref` syntax. Its file-part should resolve to an excel-file.
excel-ref xl-ref The syntax for `capturing` ranges from excel-sheets, specified within the fragment-part of a `xl-url`.
cell-pos cell-position A pair of row/col cell `coordinates` optionally followed by a parenthesized `target-moves`. It actually specifies 2 cells, `start-cell` and `target-cell`.
coord coords coordinate coordinates cell-coords row-coordinate row-coord column-coordinate col-coord The cell-column (in letters) and cell-row (number) of a cell.
absolute-coordinate absolute Any cell row/col identified with column-characters, row-numbers, or the following special-characters:
- ``^`` The top/Left full-cell `coordinate`. - ``_`` The bottom/right full-cell `coordinate`.
dependent-coordinate dependent Any `2nd-cell` `coordinate` identified with a dot(``.``), which means that:
> 2nd-start-cell coordinate = 1st target-cell coordinate
The `2nd-cell` might contain a "mix" of `absolute` and *dependent* coordinates.
primitive-directions The 4 *primitive-directions* in are denoted with one of the letters ``LURD``.
target-moves targeting A single or a pair of the 4 `primitive-directions` letters, specified inside the `cell-pos` parenthesis that follows the `coordinates` of the `start-cell` The pairs ``UD`` and ``LR``, and their inverse, are invalid.
start-cell start The cell identified by the `coordinates` of the `cell-pos` alone.
target-cell target The cell identified after applying `target-moves` on the `start-cell`. Failure to identify a target-cell raises an error.
1st-cell 1st-cell-pos 1st-start-cell 1st-target-cell The`capturing` STARTS from the `target` of *this* `cell-pos`. It supports `absolute` coordinates only.
2nd-cell 2nd-cell-pos 2nd-start-cell 2nd-target-cell The `capturing` STOPS at the `target` of this `cell-pos`. It supports both `absolute` coordinates, and `dependent` ones from the `1st-target-cell`.
capture-range range The sheet's rectangular area bounded by the `1st-target-cell` and the `2nd-target-cell`.
capturing capture-moves The reading of the `capture-range` by traversing from the `1st-target-cell` to the `2nd-target-cell`.
state cell-state Whether a cell is empty or full(non-empty).
termination-rule target-termination-rule The condition for stopping `target-moves` while searching for a `target-cell`. It can be either `search-same` or `search-opposite`.
search-same The `target-cell` is the LAST cell with the SAME `state` as the `start-cell`, while `targeting` from it.
search-opposite The `target-cell` is the FIRST cell with OPPOSITE `state` from the `start-cell`, while `targeting` from it.
range-expansions expansions How to expand the initial `capture-range`. It can be an arbitrary combinations for the ``LURD?`` letters, with repetitions.
filter filters filter-function filter-functions Predefined functions to apply for transforming the `capture-range` specified as nested *json* dictionaries.
Target-moves -------------
There are 12 `target-moves` named with a *single* or a *pair* of letters denoting the 4 primitive directions, ``LURD``::
U UL◄───┐▲┌───►UR LU │││ RU ▲ │││ ▲ │ │││ │ └─────┼│┼─────┘ L◄──────X──────►R ┌─────┼│┼─────┐ │ │││ │ ▼ │││ ▼ LD │││ RD DL◄───┘▼└───►DR D
- The 'X' at the center points the starting cell.
So a ``RD`` move means *"traverse cells first by rows then by columns"*, or more lengthy description would be:
> Start moving *right* till 1st state change, and then > move *down* to the next row, and start traversing right again."
Target-cells ------------
Using these moves we can identify a `target-cell` in relation to the `start-cell`. For instance, given this xl-sheet below, there are multiple ways to identify (or target) the non-empty values ``X``, below::
A B C D E F 1 2 3 X ──────► C3 A1(RD) _^(L) F3(L) 4 X ──────► E4 A4(R) _4(L) D1(DR) 5 X ──────► B5 A1(DR) A_(UR) _5(L) 6 X ──────► F6 __ _^(D) A_(R)
- The 'X' signifies non-empty cells.
So we can target cells with "absolute coordinates", the usual ``A1`` notation, augmented with the following special characters:
- undesrcore(``_``) for bottom/right, and - accent(``^``) for top/left
columns/rows of the sheet with non-empty values.
When no ``LURD`` moves are specified, the target-cell coinceds with the starting one.
.. Seealso:: `Target-termination rules`_ section
Ranges ------
To specify a complete `capture-range` we need to identify a 2nd cell. The 2nd target-cell may be specified:
- either with `absolute` coordinates, as above, or - with `dependent` coords, using the dot(``.``) to refer to the 1st cell.
In the above example-sheet, here are some ways to specify ranges::
A B C D E F 1
2 ┌─────┐ ┌──┼─┐ │ 3 │ │X│ │ │┌─┼─┼───┼┐ 4 ││ │ │ X││ ││ └─┼───┴┼───► C3:E4 A1(RD):..(RD) _^(L):..(DR) _4(L):A1(RD) 5 ││X │ │ │└───┼────┴───► B4:E5 A_(UR):..(RU) _5(L):1_(UR) E1(D):A.(DR) 6 │ │ X └────┴────────► Β3:C6 A1(RD):^_ ^^:C_ C_:^^
.. Warning:: Of course, the above ranges WILL FAIL since the `target-moves` will stop immediately due to ``X`` values being surrounded by empty-cells.
But the above diagram was to just convey the general idea. To make it work, all the in-between cells of the peripheral row and columns should have been also non-empty.
.. Note:: The `capture-moves` from `1st-cell` to `2nd-target-cell` are independent from the implied `target-moves` in the case of `dependent` coords.
More specifically, the `capturing` will always fetch the same values regardless of "row-first" or "column-first" order; this is not the case with `targeting` (``LURD``) moves.
For instance, to capture ``B4:E5`` in the above sheet we may use ``_5(L):E.(U)``. In that case the target cells are ``B5`` and ``E4`` and the `target-moves` to reach the 2nd one are ``UR`` which are different from the ``U`` specified on the 2nd cell.
.. Seealso:: `Target-termination rules`_ section
Target-termination rules --------------------------
- For the 1st target-cell: Target-cell is identified using `search-opposite` rule.
.. Note:: It might be useful to allow the user to reverse this behavior (ie by the use of the ``-`` char).
- For the 2nd target cell:
- If the `state` of the `2nd-start-cell` == `1st-target-cell`: - Use `search-same` to identify target.
- Otherwise: - Use `search-opposite` to identify target.
Expansions ----------
Captured-ranges ("values") may be limited due to empty-cells in the 1st row/column traversed. To overcome this, the xl-ref may specify `expansions` directions using a 3rd ``:``-section like that::
_5(L):1_(UR):RDL?U?
This particular case means:
> Try expanding Right and Down repeatedly and then try once Left and Up.
Expansion happens on a row-by-row or column-by-column basis, and terminates when a full empty(or non-empty) line is met.
Example-refs are given below for capturing the 2 marked tables::
A B C D E F G 1 ┌───────────┐ │┌─────────┐│ 2 ││ 1 X X ││ ││ ││ 3 ││X X X X││ ││ ││ 4 ││X X X 2 X││ ││ ││ 5 ││X X X X││ └┼─────────┼┴──► A1(RD):..(RD):DRL? 6 │X │ └─────────┴───► A1(RD):..(RD):L?DR A_(UR):^^(RD) 7 X
- The 'X' signify non-empty cells. - The '1' and '2' signify the identified target-cells.
.. default-role:: obj
"""
# noinspection PyUnresolvedReferences # noinspection PyUnresolvedReferences
XL_CELL_BLANK, XL_CELL_ERROR, XL_CELL_BOOLEAN, XL_CELL_NUMBER, open_workbook)
else:
'L': np.array([0, -1]), 'U': np.array([-1, 0]), 'R': np.array([0, 1]), 'D': np.array([1, 0]) }
r""" ^\s*(?:(?P<sheet>[^!]+)?!)? # xl sheet name (?: # first cell (?P<st_col>[A-Z]+|_|\^) # first col (?P<st_row>\d+|_|\^) # first row (?:\( (?P<st_mov>L|U|R|D|LD|LU|UL|UR|RU|RD|DL|DR) # moves from st cell \) )? ) (?:: # second cell [opt] (?P<nd_col>[A-Z]+|_|\^|\.) # second col (?P<nd_row>\d+|_|\^|\.) # second row (?:\( (?P<nd_mov>L|U|R|D|LD|LU|UL|UR|RU|RD|DL|DR) # moves from nd cell \) )? (?:: (?P<rng_exp>[LURD?\d]+) # range expansion [opt] )? )? \s* (?P<json>\{.*\})? # any json object [opt] \s*$""", re.IGNORECASE | re.X)
# TODO: Drop `?` from range_expansions, use numbers only. r""" ^(?P<moves>[LURD]+) # primitive moves (?P<times>\?|\d+)? # repetition times $""", re.IGNORECASE | re.X)
""" Converts the Excel `str` row to a zero-based `int`, reporting invalids.
:param str, int coord: excel-row coordinate or one of ``^_.``
:return: excel row number, >= 0 :rtype: int
Examples::
>>> row2num('1') 0
>>> row2num('10') == row2num(10) True
## "Special" cells are also valid. >>> row2num('_'), row2num('^') ('_', '^')
>>> row2num('0') Traceback (most recent call last): ValueError: Invalid row('0')!
>>> row2num('a') Traceback (most recent call last): ValueError: Invalid row('a')!
>>> row2num(None) Traceback (most recent call last): ValueError: Invalid row(None)!
"""
""" Converts the Excel `str` column to a zero-based `int`, reporting invalids.
:param str coord: excel-column coordinate or one of ``^_.``
:return: excel column number, >= 0 :rtype: int
Examples::
>>> col2num('D') 3
>>> col2num('d') 3
>>> col2num('AaZ') 727
## "Special" cells are also valid. >>> col2num('_'), col2num('^') ('_', '^')
>>> col2num(None) Traceback (most recent call last): ValueError: Invalid column(None)!
>>> col2num('4') Traceback (most recent call last): ValueError: Invalid column('4')!
>>> col2num(4) Traceback (most recent call last): ValueError: Invalid column(4)!
"""
""" Fetch a cell reference string.
:param cell_col: column reference :type cell_col: str, None
:param cell_row: row reference :type cell_row: str, None
:param cell_mov: target-moves :type cell_mov: str, None
:return: a cell-start :rtype: CellPos
Examples:: >>> make_CellPos('A', '1', 'R') CellPos(cell=Cell(row=0, col=0), mov='R')
>>> make_CellPos('^', '^', 'R').cell Cell(row='^', col='^')
>>> make_CellPos('_', '_', 'L').cell Cell(row='_', col='_')
>>> make_CellPos('.', '.', 'D').cell Cell(row='.', col='.')
>>> make_CellPos(None, None, None)
>>> make_CellPos('1', '.', None) Traceback (most recent call last): ValueError: Invalid cell(col='1', row='.') due to: Invalid column('1')!
>>> make_CellPos('A', 'B', None) Traceback (most recent call last): ValueError: Invalid cell(col='A', row='B') due to: Invalid row('B')!
>>> make_CellPos('A', '1', 12) Traceback (most recent call last): ValueError: Invalid cell(col='A', row='1') due to: 'int' object has no attribute 'upper' """
else:
""" Examples::
>>> list(_repeat_moves('ABC', '3')) ['ABC', 'ABC', 'ABC']
>>> list(_repeat_moves('ABC', '0')) []
>>> _repeat_moves('ABC') ## infinite repetitions repeat('ABC')
"""
""" Parse range-expansion into a list of dir-letters iterables.
:param rng_exp: A string with a sequence of primitive moves: es. L1U1R1D1 :type xl_ref: str
:return: A list of primitive-dir chains. :rtype: list
Examples::
>>> res = _parse_range_expansions('LURD?') >>> res [repeat('LUR'), repeat('D', 1)]
# infinite generator >>> [next(res[0]) for i in range(10)] ['LUR', 'LUR', 'LUR', 'LUR', 'LUR', 'LUR', 'LUR', 'LUR', 'LUR', 'LUR']
>>> list(res[1]) ['D']
>>> _parse_range_expansions('1LURD') Traceback (most recent call last): ValueError: Invalid range-expansion(1LURD) due to: 'NoneType' object has no attribute 'groupdict'
"""
for v in res if v != '']
""" Parses a :term:`excel-ref` and splits it in its "ingredients".
:param xl_ref: a string with the following format: <sheet>!<st_col><st_row>(<st_mov>):<nd_col><nd_row>(<nd_mov>): <rng_exp>{<json>} es. sheet!A1(DR):Z20(UL):L1U2R1D1{"json":"..."} :type xl_ref: str
:return: dictionary containing the following parameters::
- sheet - st_cell - nd_cell - rng_exp - json
:rtype: dict
Examples:: >>> from itertools import chain >>> xl_ref = 'Sheet1!A1(DR):Z20(UL):L1U2R1D1{"json":"..."}' >>> res = parse_xl_ref(xl_ref)
>>> res['sheet'] 'Sheet1'
>>> res['st_cell'] CellPos(cell=Cell(row=0, col=0), mov='DR')
>>> res['nd_cell'] CellPos(cell=Cell(row=19, col=25), mov='UL')
>>> list(chain(*res['rng_exp'])) ['L', 'U', 'U', 'R', 'D']
>>> res['json'] == {'json': '...'} True """
# resolve json
# resolve range expansions r['rng_exp']) if r['rng_exp'] else None
# fetch 1st cell
# fetch 2nd cell
""" Parses the contents of an excel url.
:param str url: a string with the following format::
<url_file>#<sheet>!<1st_cell>:<2nd_cell>:<expand><json>
Exxample::
file:///path/to/file.xls#sheet_name!UP10:DN20:LDL1{"dim":2}
:return: dictionary containing the following parameters::
- url_file - sheet - st_col - st_row - st_mov - nd_col - nd_row - nd_mov - json
:rtype: dict
Examples::
>>> url = 'file:///sample.xlsx#Sheet1!A1{"2": "ciao"}' >>> res = parse_xl_url(url) >>> sorted(res.items()) [('json', {'2': 'ciao'}), ('nd_cell', None), ('rng_exp', None), ('sheet', 'Sheet1'), ('st_cell', CellPos(cell=Cell(row=0, col=0), mov=None)), ('url_file', 'file:///sample.xlsx')]
"""
# noinspection PyProtectedMember """ Returns a boolean ndarray with `False` wherever cell are blank or empty. """
""" Returns upper and lower absolute positions.
:param ndarray full_cells: A boolean ndarray with `False` wherever cell are blank or empty. Use :func:`get_full_cells()`. return: a 2-tuple with margins and indixes for full-cells
Examples::
>>> full_cells = [ ... [0, 0, 0], ... [0, 1, 0], ... [0, 1, 1], ... [0, 0, 1], ... ] >>> sheet_margins, indices = get_sheet_margins(full_cells)
#>>> sorted(sheet_margins.items()) ## FIXME: Nested DICT?? [('col', {'^': 1, '_': 2}), ('row', {'^': 1, '_': 3})]
>>> indices [[1, 1], [2, 1], [2, 2], [3, 2]]
>>> full_cells = [ ... [0, 0, 0, 0], ... [0, 1, 0, 0], ... [0, 1, 1, 0], ... [0, 0, 1, 0], ... [0, 0, 0, 0], ... ] >>> sheet_margins_2, _ = get_sheet_margins(full_cells) >>> sheet_margins_2 == sheet_margins True
""" 'col': { '^': up_c, '_': dn_c }, 'row': { '^': up_r, '_': dn_r } }
""" Translates any special or dependent coord to absolute ones.
:param int, str coord: the coord to translate :param int, None pcoord: the basis for dependent coord, if any
No other checks performed::
>>> margins = {} >>> _get_abs_coord('_', margins) '_'
>>> _get_abs_coord('$', margins) '$' """ except ImportError: # TODO: FIX hack when ChainMap backported to py2. c = {'.': pcoord} c.update(coord_margins) coord_margins = c
""" Makes a Cell by translating any special coords to absolute ones.
:param Cell cell: The cell to translate its coords. :param Cell pcell: The cell to base any dependent coords (``.``).
Examples::
>>> _make_start_Cell(Cell(3, 1), {'row':{}, 'col':{}}) Cell(row=3, col=1)
""" cell.row, sheet_margins['row'], pcell and pcell.row) cell.col, sheet_margins['col'], pcell and pcell.col)
"""
:param bool state: the starting-state :param cell: :param ndarray full_cells: A boolean ndarray with `False` wherever cell are blank or empty. Use :func:`get_full_cells()`. :param sheet: :param directions: :return:
Examples:: >>> full_cells = np.array([ ... [0, 0, 0, 0, 0, 0, 0], ... [0, 0, 0, 0, 0, 0, 0], ... [0, 0, 0, 0, 0, 0, 0], ... [0, 0, 0, 0, 0, 0, 0], ... [0, 0, 0, 0, 0, 0, 0], ... [0, 0, 0, 0, 1, 1, 1], ... [0, 0, 0, 1, 0, 0, 1], ... [0, 0, 0, 1, 1, 1, 1] ... ]) >>> args = (False, Cell(1, 1), full_cells, (0, 0), (7, 6)) >>> _search_opposite_state(*(args + ('DR', ))) Cell(row=6, col=3)
>>> _search_opposite_state(*(args + ('RD', ))) Cell(row=5, col=4)
>>> _search_opposite_state(*(args + ('D', ))) Traceback (most recent call last): ValueError: Invalid Cell(row=1, col=1) with movement(D)
>>> _search_opposite_state(*(args + ('U', ))) Traceback (most recent call last): ValueError: Invalid Cell(row=1, col=1) with movement(U)
>>> _search_opposite_state(*(args + ('R', ))) Traceback (most recent call last): ValueError: Invalid Cell(row=1, col=1) with movement(R)
>>> _search_opposite_state(*(args + ('L', ))) Traceback (most recent call last): ValueError: Invalid Cell(row=1, col=1) with movement(L)
>>> _search_opposite_state(*(args + ('LU', ))) Traceback (most recent call last): ValueError: Invalid Cell(row=1, col=1) with movement(LU)
>>> args = (True, Cell(6, 3), full_cells, (0, 0), (7, 6)) >>> _search_opposite_state(*(args + ('D', ))) Cell(row=8, col=3)
>>> args = (True, Cell(10, 3), full_cells, (0, 0), (7, 6)) >>> _search_opposite_state(*(args + ('U', ))) Cell(row=10, col=3)
>>> args = (False, Cell(10, 10), full_cells, (0, 0), (7, 6)) >>> _search_opposite_state(*(args + ('UL', ))) Cell(row=7, col=6)
>>> full_cells = np.array([ ... [1, 1, 1], ... [1, 1, 1], ... [1, 1, 1], ... ]) >>> args = (True, Cell(0, 2), full_cells, (0, 0), (2, 2)) >>> _search_opposite_state(*(args + ('LD', ))) Cell(row=3, col=2) """
c0 = c0 - _primitive_dir[moves[1]]
"""
:param bool state: the starting-state :param cell: :param ndarray full_cells: A boolean ndarray with `False` wherever cell are blank or empty. Use :func:`get_full_cells()`. :param sheet: :param directions: :return:
Examples:: >>> full_cells = np.array([ ... [0, 0, 0, 0, 0, 0, 0], ... [0, 0, 0, 0, 0, 0, 0], ... [0, 0, 0, 0, 0, 0, 0], ... [0, 0, 0, 0, 0, 0, 0], ... [0, 0, 0, 0, 0, 0, 0], ... [0, 0, 0, 0, 1, 1, 1], ... [0, 0, 0, 1, 0, 0, 1], ... [0, 0, 0, 1, 1, 1, 1] ... ]) >>> args = (True, Cell(7, 6), full_cells, (0, 0), (7, 6)) >>> _search_same_state(*(args + ('UL', ))) Cell(row=5, col=3)
>>> _search_same_state(*(args + ('U', ))) Cell(row=5, col=6)
>>> _search_same_state(*(args + ('L', ))) Cell(row=7, col=3)
>>> args = (True, Cell(5, 3), full_cells, (0, 0), (7, 6)) >>> _search_same_state(*(args + ('DR', ))) Cell(row=5, col=3)
>>> args = (False, Cell(5, 3), full_cells, (0, 0), (7, 6)) >>> _search_same_state(*(args + ('DR', ))) Cell(row=5, col=3)
>>> _search_same_state(*(args + ('UL', ))) Traceback (most recent call last): ValueError: Invalid Cell(row=5, col=3) with movement(U)
>>> args = (True, Cell(5, 6), full_cells, (0, 0), (7, 6)) >>> _search_same_state(*(args + ('DL', ))) Cell(row=7, col=4)
"""
"""
:param state: :param up: :param dn: :param rng: :param ndarray full_cells: A boolean ndarray with `False` wherever cell are blank or empty. Use :func:`get_full_cells()`. :param rng_exp: :return:
Examples::
>>> full_cells = np.array([ ... [0, 0, 0, 0, 0, 0, 0], ... [0, 0, 0, 0, 0, 0, 0], ... [0, 0, 0, 0, 0, 0, 0], ... [0, 0, 0, 0, 0, 0, 0], ... [0, 0, 0, 0, 0, 0, 0], ... [0, 0, 0, 0, 1, 1, 1], ... [0, 0, 0, 1, 0, 0, 1], ... [0, 0, 0, 1, 1, 1, 1] ... ]) >>> rng = (Cell(row=6, col=3), Cell(row=6, col=3)) >>> rng_exp = [_repeat_moves('U', times=10)] >>> expand_range(True, rng, full_cells, rng_exp) [Cell(row=6, col=3), Cell(row=6, col=3)]
>>> rng = (Cell(row=6, col=3), Cell(row=7, col=3)) >>> rng_exp = [_repeat_moves('R', times=10)] >>> expand_range(True, rng, full_cells, rng_exp) [Cell(row=6, col=3), Cell(row=7, col=6)]
>>> rng = (Cell(row=6, col=3), Cell(row=10, col=3)) >>> rng_exp = [_repeat_moves('R', times=10)] >>> expand_range(True, rng, full_cells, rng_exp) [Cell(row=6, col=3), Cell(row=10, col=6)]
>>> rng = (Cell(row=6, col=5), Cell(row=6, col=5)) >>> rng_exp = [_repeat_moves('LURD')] >>> expand_range(True, rng, full_cells, rng_exp) [Cell(row=5, col=3), Cell(row=7, col=6)]
""" 'L': (0, 1), 'U': (0, 1), 'R': (1, 0), 'D': (1, 0) } else:
nd_cell=None, rng_exp=None): """
:param xlrd.sheet.Sheet sheet: :param CellPos st_cell: :param CellPos nd_cell: :param rng_exp: :return:
Examples::
>>> full_cells = np.array([ ... [0, 0, 0, 0, 0, 0, 0], ... [0, 0, 0, 0, 0, 0, 0], ... [0, 0, 0, 0, 0, 0, 0], ... [0, 0, 0, 0, 0, 0, 0], ... [0, 0, 0, 0, 0, 0, 0], ... [0, 0, 0, 0, 1, 1, 1], ... [0, 0, 0, 1, 0, 0, 1], ... [0, 0, 0, 1, 1, 1, 1] ... ]) >>> up, dn = ((0, 0), (7, 6)) >>> sheet_margins, ind = get_sheet_margins(full_cells) >>> st_cell = CellPos(Cell(0, 0), 'DR') >>> nd_cell = CellPos(Cell('.', '.'), 'DR') >>> _capture_range(full_cells, up, dn, sheet_margins, ind, st_cell, nd_cell) (Cell(row=6, col=3), Cell(row=7, col=3))
>>> nd_cell = CellPos(Cell(7, 6), 'UL') >>> _capture_range(full_cells, up, dn, sheet_margins, ind, st_cell, nd_cell) (Cell(row=5, col=3), Cell(row=6, col=3)) """
else:
else: not state, nd, full_cells, up, dn, mov)
else: return expand_range(state, (st, nd), full_cells, rng_exp)
""" Parse a xl-cell.
:param cell: an excel cell :type cell: xlrd.sheet.Cell
:param epoch1904: Which date system was in force when this file was last saved. False => 1900 system (the Excel for Windows default). True => 1904 system (the Excel for Macintosh default). :type epoch1904: bool
:return: formatted cell value :rtype: int, float, datetime.datetime, bool, None, str, datetime.time, float('nan')
Examples::
>>> import xlrd >>> from xlrd.sheet import Cell >>> _parse_cell(Cell(xlrd.XL_CELL_NUMBER, 1.2)) 1.2
>>> _parse_cell(Cell(xlrd.XL_CELL_DATE, 1.2)) datetime.datetime(1900, 1, 1, 4, 48)
>>> _parse_cell(Cell(xlrd.XL_CELL_TEXT, 'hi')) 'hi' """
# GH5394 - Excel 'numbers' are always floats # it's a minimal perf hit and less suprising # Use the newer xlrd datetime handling.
# Excel doesn't distinguish between dates and time, so we treat # dates on the epoch as times only. Also, Excel supports 1900 and # 1904 epochs. else: # Use the xlrd <= 0.9.2 date handling. d = xldate.xldate_as_tuple(cell.value, epoch1904) if d[0] < datetime.MINYEAR: # time d = datetime.time(*d[3:]) else: # date d = datetime.datetime(*d) return float('nan')
raise ValueError('invalid cell type %s for %s' % (cell.ctype, cell.value))
"""
:param sheet: :param xl_range: :param indices: :param epoch1904: :return:
Examples::
>>> import os, tempfile, xlrd, pandas as pd
>>> os.chdir(tempfile.mkdtemp()) >>> df = pd.DataFrame([[None, None, None], [5.1, 6.1, 7.1]]) >>> tmp = 'sample.xlsx' >>> writer = pd.ExcelWriter(tmp) >>> df.to_excel(writer, 'Sheet1', startrow=5, startcol=3) >>> writer.save()
>>> sheet = xlrd.open_workbook(tmp).sheet_by_name('Sheet1')
>>> sheet_margins, indices = get_sheet_margins(get_full_cells(sheet))
# minimum matrix in the sheet >>> st = _make_start_Cell(Cell('^', '^'), sheet_margins) >>> nd = _make_start_Cell(Cell('_', '_'), sheet_margins) >>> get_xl_table(sheet, (st, nd), indices) [[None, 0, 1, 2], [0, None, None, None], [1, 5.1, 6.1, 7.1]]
# get single value >>> get_xl_table(sheet, (Cell(6, 3), Cell(6, 3)), indices) [0]
# get column vector >>> st = _make_start_Cell(Cell(0, 3), sheet_margins) >>> nd = _make_start_Cell(Cell('_', 3), sheet_margins) >>> get_xl_table(sheet, (st, nd), indices) [None, None, None, None, None, None, 0, 1]
# get row vector >>> st = _make_start_Cell(Cell(5, 0), sheet_margins) >>> nd = _make_start_Cell(Cell(5, '_'), sheet_margins) >>> get_xl_table(sheet, (st, nd), indices) [None, None, None, None, 0, 1, 2]
# get row vector >>> st = _make_start_Cell(Cell(5, 0), sheet_margins) >>> nd = _make_start_Cell(Cell(5, 10), sheet_margins) >>> get_xl_table(sheet, (st, nd), indices) [None, None, None, None, 0, 1, 2, None, None, None, None]
""" else: # vector
# vector
else:
""" FIXME: _get_value_dim() UNUSED? """
""" Reshapes the output value of get_rect_range function.
:param value: matrix or vector or value :type value: list of lists, list, value
:param dim_min: minimum dimension :type dim_min: int, None
:param dim_max: maximum dimension :type dim_max: int, None
:return: reshaped value :rtype: list of lists, list, value
Examples::
>>> redim_captured_values([1, 2], 2) [[1, 2]]
>>> redim_captured_values([[1, 2]], 1) [[1, 2]]
>>> redim_captured_values([[1, 2]], 1, 1) [1, 2]
>>> redim_captured_values([], 2) [[]]
>>> redim_captured_values([[1, 2]], 0, 0) Traceback (most recent call last): ValueError: Cannot reduce Captured-values dimension(2) to (0, 0)!
""" # TODO: Make redimming use np-arrays.
None: {'fun': lambda x: x}, # TODO: Actually redim_captured_values(). 'df': {'fun': pd.DataFrame}, 'nparray': {'fun': np.array}, 'dict': {'fun': dict}, 'sorted': {'fun': sorted} }
available_filters=default_range_filters): """ Processes the output value of get_rect_range function.
FIXME: Actually use process_captured_values()!
:param value: matrix or vector or a scalar-value :type value: list of lists, list, value
:param str, None type: The 1st-filter to apply, if missing, applies the mapping found in the ``None --> <filter`` entry of the `available_filters` dict. :param dict, None kws: keyword arguments for the filter function :param sequence, None args: arguments for the type-function :param [(callable, *args, **kws)] filters: A list of 3-tuples ``(filter_callable, *args, **kws)`` to further process range-values. :param dict available_filters: Entries of ``<fun_names> --> <callables>`` for pre-configured filters available to post-process range-values. The callable for `None` key will be always called to the original values to ensure correct dimensionality :return: processed range-values :rtype: given type, or list of lists, list, value
Examples::
>>> value = [[1, 2], [3, 4], [5, 6]] >>> res = process_captured_values(value, type='dict') >>> sorted(res.items()) [(1, 2), (3, 4), (5, 6)]
>>> value = [[1, 9], [8, 10], [5, 11]] >>> process_captured_values(value, ... filters=[{'type':'sorted', 'kws':{'reverse': True}}]) [[8, 10], [5, 11], [1, 9]] """
#### XLRD HELPER FUNCS ###
""" Opens the excel workbook of an excel ref.
:param dict xl_ref_child: excel ref of the child
:param xl_ref_parent: excel ref of the parent :type xl_ref_parent: dict, None
""" else:
""" Opens the excel sheet of an excel ref.
:param dict xl_ref_child: excel ref of the child
:param xl_ref_parent: excel ref of the parent :type xl_ref_parent: dict, None
""" else:
|