itpseq.DataSet¶
- class itpseq.DataSet(data_path: Path = '.', result_path: Path | None = None, samples: dict | None = None, keys=None, ref_labels: str | tuple | None = 'noa', cache_path=None, file_pattern=None, aafile_pattern=None)¶
Loads an iTP-Seq dataset and provides methods for analyzing and visualizing the data.
A DataSet object is constructed to handle iTP-Seq Samples with their respective Replicates. By default, it infers the files to uses in the provided directory by looking for “*.processed.json” files produced during the initial step of pre-processing and filtering the fastq files. It uses the pattern of the file names to group the Replicates into a Sample, and to define which condition is the reference in the DataSet (the Sample with name “noa” by default).
- data_path¶
Path to the data directory containing the output files from the fastq pre-processing.
- Type:
str or Path
- result_path¶
Path to the directory where the results of the analysis will be saved.
- Type:
str or Path
- samples¶
Dictionary of Samples in the DataSet. By default, it is None and will be populated automatically.
- Type:
dict or None
- keys¶
Properties in the file name to use for identifying the reference.
- Type:
tuple
- ref_labels¶
Specifies the reference: e.g. ‘noa’ or ((‘sample’, ‘noa’),)
- Type:
str or tuple
- cache_path¶
Path used to cache intermediate results. By default, this creates a subdirectory called “cache” in the result_path directory.
- Type:
str or Path
- file_pattern¶
Regex pattern used to identify the sample files in the data_path directory. If None, defaults to r’(?P<lib_type>[^_]+)_(?P<sample>[^_d]+)(?P<replicate>d+).processed.json’ which matches files like nnn15_noa1.processed.json, nnn15_tcx2.processed.json, etc.
- Type:
str
- aafile_pattern¶
Pattern used to identify the amino acid files in the data_path directory. It will use the values captured in the file_pattern regex to construct the file names. If None, defaults to ‘{lib_type}_{sample}{replicate}_aa.processed.txt’
- Type:
str
Examples
Creating a DataSet from a simple antibiotic treatment (tcx) vs no treatement (noa) with 3 replicates each (1, 2, 3).
- Load a dataset from the current directory, inferring the samples automatically.
>>> from itpseq import DataSet >>> data = DataSet(data_path='.') >>> data DataSet(data_path=PosixPath('.'), file_pattern='(?P<lib_type>[^_]+)_(?P<sample>[^_\\d]+)(?P<replicate>\\d+)\\.processed\\.json', samples=[Sample(nnn15.noa:[1, 2, 3]), Sample(nnn15.tcx:[1, 2, 3], ref: nnn15.noa)], )
- Same as above, but only use “sample” as key.
>>> data = DataSet(data_path='.', keys=['sample']) >>> data DataSet(data_path=PosixPath('.'), file_pattern='(?P<lib_type>[^_]+)_(?P<sample>[^_\\d]+)(?P<replicate>\\d+)\\.processed\\.json', samples=[Sample(noa:[1, 2, 3]), Sample(tcx:[1, 2, 3], ref: noa)], )
- Compute a standard report and export it as PDF
>>> data.report('my_experiment.pdf')
- Display a graph of the inverse-toeprints lengths for each sample
>>> data.itp_len_plot(row='sample')
- Attributes:
- samples_with_ref
Methods
DE
([pos])Computes the log2-FoldChange for each motif described by pos for each sample in the DataSet relative to their reference
infos
([html])Displays summary information about the dataset NGS reads per replicate.
itoeprint
itp_len_plot
reorder_samples
report
- __init__(data_path: Path = '.', result_path: Path | None = None, samples: dict | None = None, keys=None, ref_labels: str | tuple | None = 'noa', cache_path=None, file_pattern=None, aafile_pattern=None)¶
Methods
DE
([pos])Computes the log2-FoldChange for each motif described by pos for each sample in the DataSet relative to their reference
__init__
([data_path, result_path, samples, ...])infos
([html])Displays summary information about the dataset NGS reads per replicate.
itoeprint
([plot, norm, norm_range, ...])itp_len_plot
([ax, col, row, min_codon, ...])reorder_samples
(order[, validate, ...])report
([template, output])Attributes
samples_with_ref
toeprint_df