Usage

Invocation

DataMatrix requires a file, or file-like object. A typical invocation is:

import datamatrix
matrix = datamatrix.DataMatrix(open("somefile"), header=True)

Aside the file object, which is mandatory, there are a number of parameters that can be used. First of all, the header parameters tells DataMatrix if the file to read has a header or not, and if so, the header will be used to assign names to the columns. Otherwise, it will just be a number for each column. To specify the column where row names are located, the row_names parameter is used:

matrix = datamatrix.DataMatrix(open("somefile", header=True, row_names=1))

In this case, row names are obtained from the first column in the file.

If you are loading a file with an empty first element on the header (that is the case with files saved by R) you must set the fixR parameter to True, which will work around this issue, otherwise you will obtain unpredictable results.

DataMatrix uses the csv module to do its parsing, so you can specify additional parameters to define the format of your data, such as delimiter (the separator between fields), lineterminator and quoting (how to deal with non-numeric fields). See the csv module documentation for additional details.

Basic operations

If you print a DataMatrix instance, you’ll get some basic information:

>>> print matrix
    File name:
    Column with identifier names: None (numeric)
    No. of rows: 2
    No. of columns: 2
    Columns: Name, surname

With the columns attribute you can view the columns as a list:

>>> print matrix.columns
    ['Name', 'surname']

Row names can be printed intead with the rownames attribute.

You can access specific rows with the getRow method:

>>> matrix.getRow(1)
    ['1', 'Albert', 'Einstein']

Or specific columns with a dictionary-like syntax::

>>> matrix["surname"]
    ['Einstein', 'Marx']

Changed in version 0.8.

In DataMatrix versions prior to 0.8, the getColumn method was used. This is no longer the case: the method has been marked as deprecated and will be removed in future versions.

To get a representation of your data, there is the view method:

>>> matrix.view()
    1 Albert Einstein
    2 Groucho Marx

Row and column manipulation

Rows and columns can be appended with the append and appendRow methods, respectively. In both cases, the item to be appended needs to be a sequence (list or tuple) and must be as long as the other columns (when appending columns) or cover all the columns (when appending rows):

>>> profession = ["scientist", "comedian"] # new column
>>> matrix.append(profession, "Job")

>>> entry = ["Isaac", "Asimov", "writer"] # new row
>>> matrix.appendRow(entry,"3")

Notice that when you append a row and a column you must specify a column or a row name to the methods, as the examples above show. Also, the rows and columns you are apppending need to be of the same length of the rows (or columns) already present in the DataMatrix instance.

Alternatively, you can insert rows and columns at a specified position using the insert (for columns) and insertRow (for rows). They behave exactly like the append* methods, with the difference that you must supply an integer argument (1 or greater than 1) representing the column or row number:

>>> matrix.insert(profession,"Job",2)
>>> matrix.inserRow(entry,"3",1)

New in version 0.7.

If the number is greater than the number of columns or rows available, the method automatically defaults to the append variant. Again, rows and columns must be of the same length as the ones already present in the instance.

Saving DataMatrix objects

You can write DataMatrix objects to files or file-like objects with the writeMatrix function present in the module:

fh = open("somefile.txt","w")
datamatrix.writeMatrix(matrix,fh)

Output formatting is again set via options to the csv module. Optionally you can save only part of the columns, specified as a list:

datamatrix.writeMatrix(matrix, fh, columns = ["Name","Job"])

If you want the header (column names) to be included, you need to set the header parameter to True:

datamatrix.writeMatrix(matrix, fh, header = True)

Further manipulation of DataMatrix objects

New in version 0.8.

For some special uses, a number of functions have been provided. elementApply applies a function to the whole matrix, matrixApply applies a function to either rows or columns, giving a single result, while filterMatrix can be used to filter rows depending on the content of a specific column. For further information, refer to the documentation strings of those functions.

You can also transpose the matrix (invert the rows and the columns) with the help of the transpose function.

Also, two conveinence functions have been provided to quickly calculate the mean of columns or rows: they are meanRows and meanColumns, respectively.