Coverage for lingpy/basic/parser.py : 97%

Hot-keys on this page
r m x p toggle line displays
j k next/prev highlighted chunk
0 (zero) top of page
1 (one) first highlighted chunk
# *-* coding: utf-8 *-* Basic parser for text files in QLC format. """
""" Basic class for the handling of text files in QLC format.
"""
def unpickle(filename):
""" Parse data regularly if the data has not been loaded from a pickled version. """
# try to load the data
# check whether it's a dictionary from which we load
# make check for correct input, there was a bug with a wrong # evaluation which is hopefully fixed by now print(input_data[0], input_data[tmp_keys[0]]) raise ValueError("[!] Wrong input format!") # pragma: no cover # check whether it's another wordlist-object filename._data.items()]) filename.header.items(), key=lambda x: x[1], reverse=False)] # or whether the data is an actual file # raise an error otherwise else: type(filename).__name__))
# load the configuration file
# read the file defined by its path in conf
# define two attributes, _alias, and _class which store the aliases and # the datatypes (classes) of the given entries # make sure the name itself is there
# add the aliases
# append the names in data[0] to self.conf to make sure that all data # is covered, even the types which are not specifically defined in the # conf file. the datatype defaults here to "str"
# add empty alias for empty strings XXX why was that? I can't remember # why this was important XXX
# the header stores the indices of the data in the original data dictionary zip([self._alias[x] for x in input_data[0]], range(len(input_data[0]))))
# now create a specific header which has all aliases
# add a sorted header for reference
# assign all aliases to the header
# assign the data as attribute to the word list class. Note that we # need to check for the type here, but since numpy also offers integer # types, we don't check for type(x) == int, but instead use the # str.numeric-function that returns numeric values only if it is an # integer int(k): v for k, v in input_data.items() if k != 0 and str(k).isnumeric()} # check for same length of all columns k, len(v), len(self.header))
# iterate over self._data and change the values according to the # functions (only needed when reading from file) ' «{4}» as datatype but received «{3}» ' + \ ' (ROW: {2}, entry {5}).' logstring.format( key, i, '|'.join([str(x) for x in self._data[key]]), self._data[key][i], self._class[head], head)) logstring.format( key, i, '|'.join([str(x) for x in self._data[key]]), self._data[key][i], self._class[head], head))
# create entry attribute of the wordlist
# assign meta-data
"""run `eval` on the string representations."""
""" Method allows quick access to the data by passing the integer key.
Parameters ---------- idx : { int, str, tuple } The index which you pass to the method, which can be either an integer which will return the respective line of your parsed data, or a string, which allows to accept all values which are stored in the _meta-attribute of your parsed data, or a tuple, consisting of an integer key for the line and a string key for the respective field.
Examples -------- Load LingPy and the test_data function which gives us access to the files in the test dataset accompanying LingPy (we use a Wordlist object, but this works likewise with QLCParser, LexStat, and Alignments)::
>>> from lingpy import * >>> from lingpy.tests.util import test_data >>> wl = Wordlist(test_data('KSL.qlc'))
Get the first line in the data::
>>> wl[1] ['Albanian', 'all', '1', 'gjithë', 'ɟiθ', ['ɟ', 'i', 'θ'], 4]
Get the first line in the data and specify the value for the "doculect" column:: >>> wl[1,'doculect'] 'Albanian'
Get the attribute "doculect" which is stored in the _meta-attribute of the Wordlist object and contains all language names in the data::
>>> wl.doculect ['Albanian', 'English', 'French', 'German', 'Hawaiian', 'Navajo', 'Turkish']
Notes ----- This method raises a KeyError if * you pass two indices and the first index is not a valid ID for any line in your data, * you pass one index and this index does neither correspond to a valid line ID in your data, nor to a valid key of the _meta-attribute of the Parser object. It returns None if you pass a valid index, but the column in your data does not exist. """ else: idx[0])) idx))
""" Modify a specific cell in a specific column of a wordlist. """ if isinstance(idx, tuple) and len(idx) == 2: try: except KeyError: idx[0])) else: raise ValueError("__setitem__ requires two values as key.")
""" Length of a Wordlist is the number of counterparts. """
""" Iteration is overloaded by iterating over all keys in the basic data. """
""" Store the QLCParser instance in a pickle file.
Notes ----- The function stores a binary file called ``FILENAME.pkl`` with ``FILENAME`` corresponding to the name of the original file in the `user cache dir <https://github.com/ActiveState/appdirs#some-example-output>`_ for lingpy on your system. To restore the instance from the pickle call :py:meth:`~lingpy.basic.parser.QLCParser.unpickle`. """ # we reset the _class attribute, because it may contain unpicklable stuff, like # `eval`ed lambdas. # after pickling we have to recreate the attribute.
self, entry, source, function, override=False, **keywords): """ Add new entry-types to the word list by modifying given ones.
Parameters ---------- entry : string A string specifying the name of the new entry-type to be added to the word list.
source : string A string specifying the basic entry-type that shall be modified. If multiple entry-types shall be used to create a new entry, they should be passed in a simple string separated by a comma.
function : function A function which is used to convert the source into the target value.
keywords : {dict} A dictionary of keywords that are passed as parameters to the function.
Notes ----- This method can be used to add new entry-types to the data by converting given ones. There are a lot of possibilities for adding new entries, but the most basic procedure is to use an existing entry-type and to modify it with help of a function.
"""
self, entry, source, function, override=False, **keywords): # check for empty entries etc.
else:
# check for override stuff, this causes otherwise an error message
# check whether the stuff is already there "Column <{entry}> already exists, do you want to override?".format( entry=entry)): return # pragma: no cover
# get the new index into the header # add a new alias if this is not specified
# get the true value
# get the new index
# change the aliased header for each entry in alias2
# modify the entries attribute
# check for multiple entries (separated by comma)
# iterate over the data and create the new entry # if the source is a dictionary, this dictionary will be directly added to the # original data-storage of the wordlist else: # get the index of the source in self
set([self._data[k][idx] for k in self._data if k != 0 and isinstance(k, int)]), key=key)
# define rows and cols as attributes of the word list
# define height and width of the word list
# row and column index point to the place where the data of the main # items is stored in the original dictionary
# create a basic array which assigns ids for the entries in a starling manner. # first, find out, how many items (== synonyms) are there maximally for each row if k != 0 and str(k).isnumeric()]:
# We must cast to a regular dict to make the attribute picklable.
# create the array by counting the maximal number of occurrences, store # the row names separately in a dictionary
# get maximal amount of "synonyms"
""" Define how attributes are overloaded. """
# get the right name
return self.get_entries(nattr)
""" Return all entries matching the given entry-type as a two-dimensional list.
Parameters ---------- entry : string The entry-type of the data that shall be returned in tabular format. """ [self[cell][self._header[entry]] if cell != 0 else 0 for cell in row]) |