DataMatrix requires a file, or file-like object. A typical invocation is:
import datamatrix
matrix = datamatrix.DataMatrix(open("somefile"), header=True)
Aside the file object, which is mandatory, there are a number of parameters that can be used. First of all, the header parameters tells DataMatrix if the file to read has a header or not, and if so, the header will be used to assign names to the columns. Otherwise, it will just be a number for each column. To specify the column where row names are located, the row_names parameter is used:
matrix = datamatrix.DataMatrix(open("somefile", header=True, row_names=1))
In this case, row names are obtained from the first column in the file.
If you are loading a file with an empty first element on the header (that is the case with files saved by R) you must set the fixR parameter to True, which will work around this issue, otherwise you will obtain unpredictable results. DataMatrix uses the csv module to do its parsing, so you can specify additional parameters to define the format of your data, such as delimiter (the separator between fields), lineterminator and quoting (how to deal with non-numeric fields). See the csv module documentation for additional details.
Notice that since the csv module does not support Unicode input, using Unicode text with DataMatrix may give unpredictable results.
Lastly, you can tell the initializer to skip a certain numbers of lines using the skip parameter.
New in version 0.9.
See also
If you print a DataMatrix instance, you’ll get some basic information:
>>> print matrix
File name:
Column with identifier names: None (numeric)
No. of rows: 2
No. of columns: 2
Columns: Name, surname
With the columns attribute you can view the columns as a list:
>>> print matrix.columns
['Name', 'surname']
Row names can be printed intead with the rownames attribute.
You can access specific rows with the getRow method:
>>> matrix.getRow(1)
['1', 'Albert', 'Einstein']
Or specific columns with a dictionary-like syntax::
>>> matrix["surname"]
['Einstein', 'Marx']
Changed in version 0.8.
In DataMatrix versions prior to 0.8, the getColumn method was used. This is no longer the case: the method has been marked as deprecated and will be removed in future versions.
To get a representation of your data, there is the view method:
>>> matrix.view()
1 Albert Einstein
2 Groucho Marx
Rows and columns can be appended with the append and appendRow methods, respectively. In both cases, the item to be appended needs to be a sequence (list or tuple) and must be as long as the other columns (when appending columns) or cover all the columns (when appending rows):
>>> profession = ["scientist", "comedian"] # new column
>>> matrix.append(profession, "Job")
>>> entry = ["Isaac", "Asimov", "writer"] # new row
>>> matrix.appendRow(entry,"3")
Notice that when you append a row and a column you must specify a column or a row name to the methods, as the examples above show. Also, the rows and columns you are apppending need to be of the same length of the rows (or columns) already present in the DataMatrix instance.
Alternatively, you can insert rows and columns at a specified position using the insert (for columns) and insertRow (for rows). They behave exactly like the append* methods, with the difference that you must supply an integer argument (1 or greater than 1) representing the column or row number:
>>> matrix.insert(profession,"Job",2)
>>> matrix.inserRow(entry,"3",1)
New in version 0.7.
If the number is greater than the number of columns or rows available, the method automatically defaults to the append variant. Again, rows and columns must be of the same length as the ones already present in the instance.
New in version 0.9: You can bind multiple DataMatrix instances by rows and columns, using the cbind and rbind functions, which join matrices by columns and rows.
An example:
>>> new_matrix = datamatrix.cbind(matrix1, matrix2)
>>> new_matrix = datamatrix.rbind(matrix1, matrix2)
Attempting to bind matrices of unequal lengths (rows or columns depending on the used function) will raise a ValueError exception.
New in version 0.9.
You can generate subsets of your matrices using the subset function and using a list of columns as a parameter. The result is a new DataMatrix instance:
new_matrix = datamatrix.subset(old_matrix, ["Supplier", "Price"])
New in version 0.8.
For some special uses, a number of functions have been provided. elementApply applies a function to the whole matrix, matrixApply applies a function to either rows or columns, giving a single result, while filterMatrix can be used to filter rows depending on the content of a specific column. For further information, refer to the documentation strings of those functions.
You can also transpose the matrix (invert the rows and the columns) with the help of the transpose function.
Also, two conveinence functions have been provided to quickly calculate the mean of columns or rows: they are meanRows and meanColumns, respectively.
New in version 0.9: The apply function can be used to apply a function to a specific column, either to each element, or to the column as a whole.
You can write DataMatrix objects to files or file-like objects with the writeMatrix function present in the module:
fh = open("somefile.txt","w")
datamatrix.writeMatrix(matrix,fh)
Output formatting is again set via options to the csv module. Optionally you can save only part of the columns, specified as a list:
datamatrix.writeMatrix(matrix, fh, columns = ["Name","Job"])
If you want the header (column names) to be included, you need to set the header parameter to True:
datamatrix.writeMatrix(matrix, fh, header = True)