loompy package¶
Submodules¶
loompy.loompy module¶
-
class
loompy.loompy.
LoomConnection
(filename: str, mode: str = 'r+') → None[source]¶ Bases:
object
-
add_columns
(submatrix: numpy.ndarray, col_attrs: typing.Dict[str, numpy.ndarray], fill_values: typing.Dict[str, numpy.ndarray] = None) → None[source]¶ Add columns of data and attribute values to the dataset.
Parameters: - submatrix (dict or numpy.ndarray) – Either: 1) A N-by-M matrix of float32s (N rows, M columns) in this case columns are added at the default layer 2) A dict {layer_name : matrix} specified so that the matrix (N, M) will be added to layer layer_name
- col_attrs (dict) – Column attributes, where keys are attribute names and values are numpy arrays (float or string) of length M
Returns: Nothing.
Notes
- This will modify the underlying HDF5 file, which will interfere with any concurrent readers.
- Column attributes in the file that are NOT provided, will be deleted.
- Array with Nan should not be provided
-
add_loom
(other_file: str, key: str = None, fill_values: typing.Dict[str, numpy.ndarray] = None, batch_size: int = 1000) → None[source]¶ Add the content of another loom file
Parameters: - other_file (str) – filename of the loom file to append
- fill_values (dict) – default values to use for missing attributes (or None to drop missing attrs, or ‘auto’ to fill with sensible defaults)
- batch_size (int) – the batch size used by batchscan (limits the number of rows/columns read in memory)
Returns: Nothing, but adds the loom file. Note that the other loom file must have exactly the same number of rows, and must have exactly the same column attributes. The all the contents including layers but ignores layers in other_file that are not already persent in self
-
batch_scan
(cells: numpy.ndarray = None, genes: numpy.ndarray = None, axis: int = 0, batch_size: int = 1000, layer: str = None) → typing.Iterable[typing.Tuple[[int, numpy.ndarray], numpy.ndarray]][source]¶ Performs a batch scan of the loom file
Parameters: - cells (np.ndarray) – the indexes [1,2,3,..,1000] of the cells to select
- genes (np.ndarray) – the indexes [1,2,3,..,1000] of the genes to select
- axis (int) – 0:rows or 1:cols
- batch_size (int) – the chuncks returned at every element of the iterator
Returns: - Iterable that yields triplets
- (ix, indexes, vals)
- ix (int) – first position / how many rows/cols have been yielded alredy
- indexes (np.ndarray[int]) – the indexes with the same numbering of the input args cells / genes (i.e. np.arange(len(ds.shape[axis]))) this is ix + selection
- vals (np.ndarray) – the matrix corresponding to the chunk
-
batch_scan_layers
(cells: numpy.ndarray = None, genes: numpy.ndarray = None, axis: int = 0, batch_size: int = 1000, layers: typing.Iterable = None) → typing.Iterable[typing.Tuple[[int, numpy.ndarray], typing.Dict]][source]¶ Performs a batch scan of the loom file dealing with multiple layer files
Parameters: - cells (np.ndarray) – the indexes [1,2,3,..,1000] of the cells to select
- genes (np.ndarray) – the indexes [1,2,3,..,1000] of the genes to select
- axis (int) – 0:rows or 1:cols
- batch_size (int) – the chuncks returned at every element of the iterator
- layers (iterable) – if specified it will batch scan only accross some of the layers of the loom file i.g. if layers = [“”] batch_scan_layers is equivalent to batch_scan
Returns: - Iterable that yields triplets
- (ix, indexes, vals)
- ix (int) – first position / how many rows/cols have been yielded alredy
- indexes (np.ndarray[int]) – the indexes with the same numbering of the input args cells / genes (i.e. np.arange(len(ds.shape[axis]))) this is ix + selection
- vals (Dict[layername, np.ndarray]) – a dictionary of the matrixes corresponding to the chunks of different layers
-
delete_attr
(name: str, axis: int = 0, raise_on_missing: bool = True) → None[source]¶ Permanently delete an existing attribute and all its values
Parameters: - name (str) – Name of the attribute to remove
- axis (int) – Axis of the attribute (0 = rows, 1 = columns)
Returns: Nothing.
-
get_edges
(name: str, axis: int) → typing.Tuple[[numpy.ndarray, numpy.ndarray], numpy.ndarray][source]¶
-
map
(f_list: typing.List[typing.Callable[numpy.ndarray, int]], axis: int = 0, chunksize: int = 1000, selection: numpy.ndarray = None) → typing.List[numpy.ndarray][source]¶ Apply a function along an axis without loading the entire dataset in memory.
Parameters: - f (list of func) – Function(s) that takes a numpy ndarray as argument
- axis (int) – Axis along which to apply the function (0 = rows, 1 = columns)
- chunksize (int) – Number of rows (columns) to load per chunk
- selection (array of bool) – Columns (rows) to include
Returns: numpy.ndarray result of function application
If you supply a list of functions, the result will be a list of numpy arrays. This is more efficient than repeatedly calling map() one function at a time.
-
permute
(ordering: numpy.ndarray, axis: int) → None[source]¶ Permute the dataset along the indicated axis.
Parameters: - ordering (list of int) – The desired order along the axis
- axis (int) – The axis along which to permute
Returns: Nothing.
-
set_attr
(name: str, values: numpy.ndarray, axis: int = 0, dtype: str = None) → None[source]¶ Create or modify an attribute.
Parameters: - name (str) – Name of the attribute
- values (numpy.ndarray) – Array of values of length equal to the axis length
- axis (int) – Axis of the attribute (0 = rows, 1 = columns)
Returns: Nothing.
This will overwrite any existing attribute of the same name.
-
set_edges
(name: str, a: numpy.ndarray, b: numpy.ndarray, w: numpy.ndarray, axis: int) → None[source]¶
-
-
class
loompy.loompy.
LoomLayer
(ds: loompy.loompy.LoomConnection, name: str, dtype: str) → None[source]¶ Bases:
object
-
resize
(size: typing.Tuple[int, int], axis: int = None) → None[source]¶ Resize the dataset, or the specified axis.
The dataset must be stored in chunked format; it can be resized up to the “maximum shape” (keyword maxshape) specified at creation time. The rank of the dataset cannot be changed. “Size” should be a shape tuple, or if an axis is specified, an integer.
BEWARE: This functions differently than the NumPy resize() method! The data is not “reshuffled” to fit in the new shape; each axis is grown or shrunk independently. The coordinates of existing data are fixed.
-
-
loompy.loompy.
combine
(files: typing.List[str], output_file: str, key: str = None, file_attrs: typing.Dict[str, str] = None, batch_size: int = 1000) → None[source]¶ Combine two or more loom files and save as a new loom file
Parameters: - files (list of str) – the list of input files (full paths)
- output_file (str) – full path of the output loom file
- key (string) – Row attribute to use to verify row ordering
- file_attrs (dict) – file attributes (title, description, url, etc.)
- batch_size (int) – limits the batch or cols/rows read in memory (default: 1000)
Returns: Nothing, but creates a new loom file combining the input files.
The input files must (1) have exactly the same number of rows, (2) have exactly the same sets of row and column attributes.
-
loompy.loompy.
connect
(filename: str, mode: str = 'r+') → loompy.loompy.LoomConnection[source]¶ Establish a connection to a .loom file.
Parameters: - filename (str) – Name of the .loom file to open
- mode (str) – read/write mode, accepts ‘r+’ (read/write) or ‘r’ (read-only), defaults to ‘r+’
Returns: A LoomConnection instance.
-
loompy.loompy.
create
(filename: str, matrix: numpy.ndarray, row_attrs: typing.Dict[str, numpy.ndarray], col_attrs: typing.Dict[str, numpy.ndarray], file_attrs: typing.Dict[str, str] = None, chunks: typing.Tuple[int, int] = (64, 64), chunk_cache: int = 512, dtype: str = 'float32', compression_opts: int = 2) → loompy.loompy.LoomConnection[source]¶ Create a new .loom file from the given data.
Parameters: - filename (str) – The filename (typically using a .loom file extension)
- matrix (numpy.ndarray) – Two-dimensional (N-by-M) numpy ndarray of float values
- row_attrs (dict) – Row attributes, where keys are attribute names and values are numpy arrays (float or string) of length N
- col_attrs (dict) – Column attributes, where keys are attribute names and values are numpy arrays (float or string) of length M
- chunks (tuple) – The chunking of the matrix. Small chunks are slow when loading a large batch of rows/columns in sequence, but fast for single column/row retrieval. Defaults to (64,64).
- chunk_cache (int) – Sets the chunk cache used by the HDF5 format inside the loom file, in MB. If the cache is too small to contain all chunks of a row/column in memory, then sequential row/column access will be a lot slower. Defaults to 512.
- dtype (str) – Dtype of the matrix. Default float32 (uint16, float16 could be used)
- compression_opts (int) – Strenght of the gzip compression. Default None.
Returns: LoomConnection to created loom file.
-
loompy.loompy.
create_from_cellranger
(indir: str, outdir: str = None, genome: str = None) → loompy.loompy.LoomConnection[source]¶ Create a .loom file from 10X Genomics cellranger output
Parameters: - indir (str) – path to the cellranger output folder (the one that contains ‘outs’)
- outdir (str) – output folder wher the new loom file should be saved (default to indir)
- genome (str) – genome build to load (e.g. ‘mm10’; default: determine species from outs folder)
Returns: Nothing, but creates loom_file