Coverage for pandalone\xlasso\_lasso.py : 94%

Hot-keys on this page
r m x p toggle line displays
j k next/prev highlighted chunk
0 (zero) top of page
1 (one) first highlighted chunk
#!/usr/bin/env python # -*- coding: UTF-8 -*- # # Copyright 2014 European Commission (JRC); # Licensed under the EUPL (the 'Licence'); # You may not use this work except in compliance with the Licence. # You may obtain a copy of the Licence at: http://ec.europa.eu/idabc/eupl The high-level functionality, the filtering and recursive :term:`lassoing`.
Prefer accessing the public members from the parent module.
.. currentmodule:: pandalone.xlasso """
""" A caching-store of :class:`ABCSheet` instances, serving them based on (workbook, sheet) IDs, optionally creating them from backends.
:ivar dict _cached_sheets: A cache of all _Spreadsheets accessed so far, keyed by multiple keys generated by :meth:`_derive_sheet_keys`. :ivar ABCSheet _current_sheet: The last used sheet, used when unspecified by the :term:`xl-ref`.
- To avoid opening non-trivial workbooks, use the :meth:`add_sheet()` to pre-populate this cache with them.
- The last sheet added becomes the *current-sheet*, and will be served when :term:`xl-ref` does not specify any workbook and sheet.
.. Tip:: For the simplest API usage, try this::
>>> sf = SheetsFactory() >>> sf.add_sheet(some_sheet) # doctest: +SKIP >>> lasso('A1:C3(U)', sf) # doctest: +SKIP
- The *current-sheet* is served only when wokbook-id is `None`, that is, the id-pair ``('foo.xlsx', None)`` does not hit it, so those ids are send to the cache as they are.
- To add another backend, modify the opening-sheets logic (ie clipboard), override :meth:`_open_sheet()`.
- It is a resource-manager for contained sheets, wo it can be used wth a `with` statement.
"""
else:
""" Retuns the product of user-specified and sheet-internal keys.
:param wb_ids: a single or a sequence of extra workbook-ids (ie: file, url) :param sh_ids: a single or sequence of extra sheet-ids (ie: name, index, None) """
for p in key_pairs if p[0]))
"""Closes all contained sheets and empties cache."""
no_current=False): """ Updates cache and (optionally) `_current_sheet`.
:param wb_ids: a single or sequence of extra workbook-ids (ie: file, url) :param sh_ids: a single or sequence of extra sheet-ids (ie: name, index, None) """
msg = "No current-sheet exists yet!. Specify a Workbook." raise ValueError(msg)
sheet = csheet.open_sibling_sheet(sheet_id, opts) assert sheet, (wb_id, sheet_id, opts) self.add_sheet(sheet, wb_id, sheet_id) else:
"""OVERRIDE THIS to change backend."""
('xl_ref', 'url_file', 'sh_name', 'st_edge', 'nd_edge', 'exp_moves', 'call_spec', 'sheet', 'st', 'nd', 'values', 'base_coords', 'opts')) """ All the fields used by the algorithm, populated stage-by-stage by :class:`Ranger`.
:param str xl_ref: The full url, populated on parsing. :param str sh_name: Parsed sheet name (or index, but still as string), populated on parsing. :param Edge st_edge: The 1st edge, populated on parsing. :param Edge nd_edge: The 2nd edge, populated on parsing. :param Coords st: The top-left targeted coords of the :term:`capture-rect`, populated on :term:`capturing`.` :param Coords nd: The bottom-right targeted coords of the :term:`capture-rect`, populated on :term:`capturing` :param ABCSheet sheet: The fetched from factory or ranger's current sheet, populated after :term:`capturing` before reading. :param values: The excel's table-values captured by the :term:`lasso`, populated after reading updated while applying :term:`filters`. :param dict or ChainMap opts: - Before `parsing`, they are just any 'opts' dict found in the :term:`filters`. - After *parsing, a 2-map ChainMap with :attr:`Ranger.base_opts` and options extracted from *filters* on top. """
"""Make :class:`Lasso` construct with all missing fields as `None`."""
""" The director-class that performs all stages required for "throwing the lasso" around rect-values.
Use it when you need to have total control of the procedure and configuration parameters, since no defaults are assumed.
The :meth:`do_lasso()` does the job.
:ivar SheetsFactory sheets_factory: Factory of sheets from where to parse rect-values; does not close it in the end. Maybe `None`, but :meth:`do_lasso()` will scream unless invoked with a `context_lasso` arg containing a concrete :class:`ABCSheet`. :ivar dict base_opts: The :term:`opts` that are deep-copied and used as the defaults for every :meth:`do_lasso()`, whether invoked directly or recursively by :meth:`recursive_filter()`. If unspecified, no opts are used, but this attr is set to an empty dict. See :func:`get_default_opts()`. :ivar dict or None available_filters: No filters exist if unspecified. See :func:`get_default_filters()`. :ivar Lasso intermediate_lasso: A ``('stage', Lasso)`` pair with the last :class:`Lasso` instance produced during the last execution of the :meth:`do_lasso()`. Used for inspecting/debuging. :ivar Context: On recursive invocations with meth:`recursive_filter`, these fields are extracted from :meth:`do_lasso()` `context_kwds` arg and preserved when the parsed ones are `None`. """
base_opts=None, available_filters=None):
"""Replace lasso-values and updated :attr:`intermediate_lasso`."""
# Just to update intermediate_lasso.
func_desc = _build_call_help(func_name, func, func_desc) log.warning( msg, func_name, args, kwds, ex, help_msg, exc_info=1) else:
""" Apply all call-specifiers one after another on the captured values.
:param list pipe: the call-specifiers """
""" Recursively expand any :term:`xl-ref` strings found by treating values as mappings (dicts, df, series) and/or nested lists.
- The `include`/`exclude` filter args work only for dict-like objects with ``items()`` or ``iteritems()`` and indexing methods, i.e. Mappings, series and dataframes.
- If no filter arg specified, expands for all keys. - If only `include` specified, rejects all keys not explicitly contained in this filter arg. - If only `exclude` specified, expands all keys not explicitly contained in this filter arg. - When both `include`/`exclude` exist, only those explicitely included are accepted, unless also excluded.
- Lower the :mod:`logging` level to see other than syntax-errors on recursion reported on :data:`log`. - Only those in :attr:`Ranger.Context` are passed recursively.
:param list or str include: Items to include in the recursive-search. See descritpion above. :param list or str exclude: Items to include in the recursive-search. See descritpion above. :param int or None depth: How deep to dive into nested structures for parsing xl-refs. If `< 0`, no limit. If 0, stops completely. """
msg = '%s \n @Lasso: %s' % (msg, lasso)
if cdepth == 0: base_coords = base_coords._replace(row=i) elif cdepth == 1: base_coords = base_coords._replace(col=i)
lasso._asdict()) log.debug(msg, vals, ex)
else: # Dict is not ordered, so cannot locate `base_coords`! if isinstance(vals, dict) else new_base_coords(base_coords, cdepth, i))
"""Creates the lasso to be used for each new :meth:`do_lasso()` invocation."""
""" Merges xl-ref parsed-parsed_fields with `init_lasso`, reporting any errors.
:param Lasso init_lasso: Default values to be overridden by non-nulls. Note that ``init_lasso.opts`` must be a `ChainMap`, as returned by :math:`_make_init_Lasso()`.
:return: a Lasso with any non `None` parsed-fields updated """
parsed_fields) log.debug(msg, xlref, ex, exc_info=1) # raise fututils.raise_from(ValueError(msg % (xlref, ex)), ex) see GH # 141 raise ValueError(msg % (xlref, ex))
wb_ids=url_file, sh_ids=sh_name)
lasso.url_file, lasso.sh_name, lasso.opts) lasso.url_file, lasso.sh_name, lasso.opts) # Maybe context had a Sheet already. raise ValueError(msg % (lasso.url_file, lasso.sh_name, ex))
sheet.get_margin_coords(), lasso.st_edge, lasso.nd_edge, lasso.exp_moves, lasso.base_coords) raise ValueError(msg % (_Lasso_to_edges_str(lasso), ex))
""" The director-method that does all the job of hrowing a :term:`lasso` around spreadsheet's rect-regions according to :term:`xl-ref`.
:param str xlref: a string with the :term:`xl-ref` format::
<url_file>#<sheet>!<1st_edge>:<2nd_edge>:<expand><js_filt>
i.e.::
file:///path/to/file.xls#sheet_name!UPT8(LU-):_.(D+):LDL1{"dims":1}
:param Lasso context_kwds: Default :class:`Lasso` fields in case parsed ones are `None` :return: The final :class:`Lasso` with captured & filtered values. :rtype: Lasso """ raise ValueError("Expected a string as `xl-ref`: %s" % xlref)
# relasso(values) invoked internally.
############### # FILTER-DEFS ###############
sig = func and inspect.formatargspec(*inspect.getfullargspec(func)) return '\n\nFilter: %s%s:\n%s' % (name, sig, desc)
""" Identifies rect from its edge-coordinates (row, col, 2d-table)..
:param Coords st: the top-left edge of capture-rect, inclusive :param Coords or None nd: the bottom-right edge of capture-rect, inclusive :return: in int based on the input like that:
- 0: only `st` given - 1: `st` and `nd` point the same cell - 2: row - 3: col - 4: 2d-table
Examples::
>>> _classify_rect_shape((1,1), None) 0 >>> _classify_rect_shape((2,2), (2,2)) 1 >>> _classify_rect_shape((2,2), (2,20)) 2 >>> _classify_rect_shape((2,2), (20,2)) 3 >>> _classify_rect_shape((2,2), (20,20)) 4 """
""" Append trivial dimensions to the left.
:param values: The scalar ot 2D-results of :meth:`Sheet.read_rect()` :param int new_dim: The new dimension the result should have """
""" Squeeze it, and then flatten it, before inflating it.
:param values: The scalar ot 2D-results of :meth:`Sheet.read_rect()` :param int new_dim: The new dimension the result should have """ else:
""" Reshapes the :term:`capture-rect` values of :func:`read_capture_rect()`.
:param values: The scalar ot 2D-results of :meth:`Sheet.read_rect()` :type values: (nested) list, * :param new_ndim: :type int, (int, bool) or None new_ndim:
:return: reshaped values :rtype: list of lists, list, *
Examples::
>>> _redim([1, 2], 2) [[1, 2]]
>>> _redim([[1, 2]], 1) [1, 2]
>>> _redim([], 2) [[]]
>>> _redim([[3.14]], 0) 3.14
>>> _redim([[11, 22]], 0) [11, 22]
>>> arr = [[[11], [22]]] >>> arr == _redim(arr, None) True
>>> _redim([[11, 22]], 0) [11, 22] """ return values
"""A list :term:`call-spec` for :meth:`_redim_filter` :term:`filter` that imitates results of *xlwings* library."""
scalar=None, cell=None, row=None, col=None, table=None): """ Reshape and/or transpose captured values, depending on rect's shape.
Each dimension might be a single int or None, or a pair [dim, transpose]. """
""" The default available :term:`filters` used by :func:`lasso()` when constructing its internal :class:`Ranger`.
:param dict or None overrides: Any items to update the default ones.
:return: a dict-of-dicts with 2 items:
- *func*: a function with args: ``(Ranger, Lasso, *args, **kwds)`` - *desc*: help-text replaced by ``func.__doc__`` if missing.
:rtype: dict """ 'pipe': { 'func': Ranger.pipe_filter, }, 'recurse': { 'func': Ranger.recursive_filter, }, 'redim': { 'func': redim_filter, }, 'numpy': { 'func': lambda ranger, lasso, * args, **kwds: lasso._replace( values=np.array(lasso.values, *args, **kwds)), 'desc': np.array.__doc__, }, 'dict': { 'func': lambda ranger, lasso, * args, **kwds: lasso._replace( values=dict(lasso.values, *args, **kwds)), 'desc': dict.__doc__, }, 'odict': { 'func': lambda ranger, lasso, * args, **kwds: lasso._replace( values=OrderedDict(lasso.values, *args, **kwds)), 'desc': OrderedDict.__doc__, }, 'sorted': { 'func': lambda ranger, lasso, * args, **kwds: lasso._replace( values=sorted(lasso.values, *args, **kwds)), 'desc': sorted.__doc__, }, }
'names') is None else None # , convert_float=True,
'df': { 'func': _df_filter, 'desc': parsers.TextParser.__doc__, }, 'series': { 'func': lambda ranger, lasso, *args, **kwds: pd.Series(OrderedDict(lasso.values), *args, **kwds), 'desc': ("Converts a 2-columns list-of-lists into pd.Series.\n" + pd.Series.__doc__), } }) except ImportError as ex: msg = "The 'df' and 'series' filters were notinstalled, due to: %s" log.info(msg, ex)
filters.update(overrides)
""" Default :term:`opts` used by :func:`lasso()` when constructing its internal :class:`Ranger`.
:param dict or None overrides: Any items to update the default ones. """ 'lax': False, 'verbose': False, 'read': {'on_demand': True, }, }
opts.update(overrides)
base_opts=None, available_filters=None): """ Makes a defaulted :class:`Ranger`.
:param sheets_factory: Factory of sheets from where to parse rect-values; if unspecified, a new :class:`SheetsFactory` is created. Remember to invoke its :meth:`SheetsFactory.close()` to clear resources from any opened sheets. :param dict or None base_opts: Default opts to affect the lassoing, to be merged with defaults; uses :func:`get_default_opts()`.
Read the code to be sure what are the available choices :-(. :param dict or None available_filters: The available :term:`filters` to specify a :term:`xl-ref`. Uses :func:`get_default_filters()` if unspecified.
""" base_opts or get_default_opts(), available_filters or get_default_filters())
sheets_factory=None, base_opts=None, available_filters=None, return_lasso=False, **context_kwds): """ High-level function to :term:`lasso` around spreadsheet's rect-regions according to :term:`xl-ref` strings by using internally a :class:`Ranger` .
:param str xlref: a string with the :term:`xl-ref` format::
<url_file>#<sheet>!<1st_edge>:<2nd_edge>:<expand><js_filt>
i.e.::
file:///path/to/file.xls#sheet_name!UPT8(LU-):_.(D+):LDL1{"dims":1}
:param sheets_factory: Factory of sheets from where to parse rect-values; if unspecified, the new :class:`SheetsFactory` created is closed afterwards. Delegated to :func:`make_default_Ranger()`, so items override default ones; use a new :class:`Ranger` if that is not desired. :ivar dict or None base_opts: Opts affecting the lassoing procedure that are deep-copied and used as the base-opts for every :meth:`Ranger.do_lasso()`, whether invoked directly or recursively by :meth:`Ranger.recursive_filter()`. Read the code to be sure what are the available choices. Delegated to :func:`make_default_Ranger()`, so items override default ones; use a new :class:`Ranger` if that is not desired. :param dict or None available_filters: Delegated to :func:`make_default_Ranger()`, so items override default ones; use a new :class:`Ranger` if that is not desired. :param bool return_lasso: If `True`, values are contained in the returned Lasso instance, along with all other artifacts of the :term:`lassoing` procedure.
For more debugging help, create a :class:`Range` yourself and inspect the :attr:`Ranger.intermediate_lasso`. :param Lasso context_kwds: Default :class:`Lasso` fields in case parsed ones are `None` (i.e. you can specify the sheet like that).
:return: Either the captured & filtered values or the final :class:`Lasso`, depending on the `return_lassos` arg. """
base_opts=base_opts, available_filters=available_filters) finally:
|