Module reference

DataMatrix

class datamatrix.DataMatrix(fh=None, row_names=None, header=True, delimiter='t', quoting=0, quotechar="'", fixR=False, skip=0)

Pythonic implementation of R’s data.frame structure. This class loads a file or a file-like structure then it transforms it into a dictionary of (row name, value) tuples for each column. Optionally only column values can be retrieved and at the same time single lines can be queried. Missing values are inputted as NAs.

The column used for row names (row_names) can be specified, otherwise rows are numbered sequentially. If the file has an header (default True), it is used to name the fields, otherwise the fields are numbered sequentially, with the first field being “x” (like R does).

If you are loading a text object saved by R using row.names=TRUE, the topmost, leftmost record will be blank. To parse such files, specify fixR as True in the initializer options.

Other options include the delimiter, line terminator and quoting, and they are passed directly to the csv DictReader instance which will read the file. See the csv module documentation for more details.

Notable methods are:
  • getRow - returns a specific row

  • view - Outputs a tab-delimited view of a number of

    lines. Start line and how much to show are configurable.

  • append - adds a column (of the same length and with the same

    identifiers) to the matrix, kind of equivalent to R’s cbind.

  • appendRow - adds a row at the end of the matrix, of the same length as

    the columns. It can be seen as similar to R’s rbind.

  • insert - inserts a column at a specified index

  • insertRow - similar to insert, but works with rows

  • iterRows - cycle through rows

  • getRowByID - get a row with a specified row name

append(other, column_name)
Method to append a column. It needs a sequence (tuple or list) and a column name to be supplied. The sequence must be of the same length as the other columns.
appendRow(other, row_name)
Appends a row to the end of the matrix. The row must encompass all the columns (i.e., it should be as long as to cover all the columns). The row name is specified in the mandatory parameter row_name.
getColumn(key, column_name=False)
Gets a specific column, without the identifier. The result is returned as a list. Optionally the column name can be printed. DEPRECATED: Use datamatrix[colname] instead.
getRow(row_number, columns='all', row_name=True)
Returns a specific row, identified from the row number, as a list. You can specify how many columns are outputted (default: all) with the columns parameter.
getRowByID(rowId, **kwargs)
Fetches a specific row basing on the identifier. If there is no match, a ValueError is raised.
insert(other, column_name, column_no)
Method to insert a column at a specified column index.
insertRow(other, row_name, lineno)
Method that inserts a row at a specified line number.
iterRows(**kwargs)

Iterate over a matrix’s rows.

>>> from StringIO import StringIO
>>> matrixfile = StringIO(
... '''a b c d
... 3 3 3 3
... 2 2 2 2''')
>>> matrix =  DataMatrix(matrixfile)
>>> for row in matrix.iterRows():
...     print row
['1', '3 3 3 3']
['2', '2 2 2 2']
pop(index=-1)
Method analogous to the pop method of lists, with the difference that this one removes rows and returns the removed item. If no index (a.k.a. row number) is supplied, the last item is removed.
rename(oldColumn, newColumn)
Renames a column.
replace(other, colName)
Replace a column with another.
view(lines=10, start_at=1, *args, **kwargs)
Method used to print on-screen the table. The number of lines, and the starting line can be configured via the start_at and lines parameters. Optional parameters can be sent to getRow to select which columns are printed.

EmptyMatrix

class datamatrix.EmptyMatrix(identifier=None, row_names=None, columns=None)
DataMatrix variant that once instantiated generates an empty matrix with specified columns and row names. Does not depend upon reading a file. Rows and columns, after initialization, can be added with insertRows, insert, appendRows and append methods, respectively.

Functions

datamatrix.writeMatrix(data_matrix, fh=None, delimiter='t', lineterminator='n', quoting=0, header=False, row_names=True, *args, **kwargs)
Function that saves DataMatrix objects to files or file-like objects. A file handle is a mandatory parameter, along with the data matrix object you want to use. You can optionally pass more parameters to getRows to select which columns are saved.
datamatrix.elementApply(matrix, func, columns=None)
Applies a function to each column for each row, and outputs a new matrix as result. If the function requires any type conversion, that must be done by the user. If the columns parameter is not None, the function is applied only to specific columns.
datamatrix.matrixApply(matrix, func, what='rows', resultName='Function result')
Apply a user-specified function to all rows or all columns. If the function requires any type conversion, that must be done by the user. The function must process the row (or the column) and return a single value. The final result is a DataMatrix instance containing one row (or one column) with the function results. The name of the column (or row) can be changed with the resultName parameter.
datamatrix.filterMatrix(matrix, func, column)
Function which returns a DataMatrix with a column that satisfies specified criteria. In particular, “func” must be a function applied to each row (on the column of interest) and should return True if the row needs to be included, and False otherwise.
datamatrix.meanRows(sourceMatrix)
Convenience function to calculate the mean of the rows of a DataMatrix instance.
datamatrix.meanColumns(sourceMatrix)
Convenience function to calculate the mean of the columns of a DataMatrix instance.
datamatrix.transpose(matrix, identifier='x')
Transposes a DataMatrix object: rows become columns and vice versa. The optional parameter identifier is passed as the resulting matrix’s identifer name.
datamatrix.apply(matrix, func, column, whole=False)
Applies a function to a specific column. Rows are not supported (yet). The operation is performed in-place. If the parameter “whole” is specified, the function is applied to all the column at once, rather than to each of its members.
datamatrix.cbind(*args)
Joins a list of matrices by their columns. They must have the same length. Additionally, the two matrices must not have equal column names. Returns a DataMatrix instance of the joined result.
datamatrix.rbind(first_matrix, second_matrix)
Joins two matrices by their rows. The number of columns of the two matrices must be identical. Returns a new DataMatrix instance with the joined result.
datamatrix.subset(matrix, column_list)
Creates a new matrix with only a subset of the columns present, as specified by the columns_list parameter.

Table Of Contents

Previous topic

License

This Page