Pythonic implementation of R’s data.frame structure.
This class loads a file or a file-like structure then it
transforms it into a dictionary of (row name, value) tuples for
each column. Optionally only column values can be retrieved and
at the same time single lines can be queried.
Missing values are inputted as NAs.
The column used for row names (row_names) can be specified,
otherwise rows are numbered sequentially. If the file has
an header (default True), it is used to name the fields, otherwise the
fields are numbered sequentially, with the first field
being “x” (like R does).
If you are loading a text object saved by R using row.names=TRUE, the
topmost, leftmost record will be blank. To parse such files, specify fixR as
True in the initializer options.
Other options include the delimiter, line terminator and quoting,
and they are passed directly to the csv DictReader instance which will
read the file. See the csv module documentation for more details.
- Notable methods are:
getRow - returns a specific row
- view - Outputs a tab-delimited view of a number of
lines. Start line and how much to show are configurable.
- append - adds a column (of the same length and with the same
identifiers) to the matrix, kind of equivalent to R’s cbind.
- appendRow - adds a row at the end of the matrix, of the same length as
the columns. It can be seen as similar to R’s rbind.
insert - inserts a column at a specified index
insertRow - similar to insert, but works with rows
iterRows - cycle through rows
getRowByID - get a row with a specified row name
-
append(other, column_name)
- Method to append a column. It needs a sequence (tuple or list) and
a column name to be supplied. The sequence must be of the same length
as the other columns.
-
appendRow(other, row_name)
- Appends a row to the end of the matrix. The row must encompass all
the columns (i.e., it should be as long as to cover all the columns).
The row name is specified in the mandatory parameter row_name.
-
getColumn(key, column_name=False)
- Gets a specific column, without the
identifier. The result is returned as a list.
Optionally the column name can be printed.
DEPRECATED: Use datamatrix[colname] instead.
-
getRow(row_number, columns='all', row_name=True)
- Returns a specific row, identified from the
row number, as a list. You can specify how many
columns are outputted (default: all) with the columns parameter.
-
getRowByID(rowId, **kwargs)
- Fetches a specific row basing on the identifier. If there is no
match, a ValueError is raised.
-
insert(other, column_name, column_no)
- Method to insert a column at a specified column index.
-
insertRow(other, row_name, lineno)
- Method that inserts a row at a specified line number.
-
iterRows(**kwargs)
Iterate over a matrix’s rows.
>>> from StringIO import StringIO
>>> matrixfile = StringIO(
... '''a b c d
... 3 3 3 3
... 2 2 2 2''')
>>> matrix = DataMatrix(matrixfile)
>>> for row in matrix.iterRows():
... print row
['1', '3 3 3 3']
['2', '2 2 2 2']
-
pop(index=-1)
- Method analogous to the pop method of lists, with the difference
that this one removes rows and returns the removed item. If no index
(a.k.a. row number) is supplied, the last item is removed.
-
replace(other, colName)
- Replace a column with another.
-
view(lines=10, start_at=1, *args, **kwargs)
- Method used to print on-screen the table.
The number of lines, and the starting line can be
configured via the start_at and lines parameters.
Optional parameters can be sent to getRow to
select which columns are printed.