itpseq.DataSet

class itpseq.DataSet(data_path: Path = '.', result_path: Path | None = None, samples: dict | None = None, keys=None, ref_labels: str | tuple | None = 'noa', cache_path=None, file_pattern=None, aafile_pattern=None)

Loads an iTP-Seq dataset and provides methods for analyzing and visualizing the data.

A DataSet object is constructed to handle iTP-Seq Samples with their respective Replicates. By default, it infers the files to uses in the provided directory by looking for “*.processed.json” files produced during the initial step of pre-processing and filtering the fastq files. It uses the pattern of the file names to group the Replicates into a Sample, and to define which condition is the reference in the DataSet (the Sample with name “noa” by default).

data_path

Path to the data directory containing the output files from the fastq pre-processing.

Type:

str or Path

result_path

Path to the directory where the results of the analysis will be saved.

Type:

str or Path

samples

Dictionary of Samples in the DataSet. By default, it is None and will be populated automatically.

Type:

dict or None

keys

Properties in the file name to use for identifying the reference.

Type:

tuple

ref_labels

Specifies the reference: e.g. ‘noa’ or ((‘sample’, ‘noa’),)

Type:

str or tuple

cache_path

Path used to cache intermediate results. By default, this creates a subdirectory called “cache” in the result_path directory.

Type:

str or Path

file_pattern

Regex pattern used to identify the sample files in the data_path directory. If None, defaults to r’(?P<lib_type>[^_]+)_(?P<sample>[^_d]+)(?P<replicate>d+).processed.json’ which matches files like nnn15_noa1.processed.json, nnn15_tcx2.processed.json, etc.

Type:

str

aafile_pattern

Pattern used to identify the amino acid files in the data_path directory. It will use the values captured in the file_pattern regex to construct the file names. If None, defaults to ‘{lib_type}_{sample}{replicate}_aa.processed.txt’

Type:

str

Examples

Creating a DataSet from a simple antibiotic treatment (tcx) vs no treatement (noa) with 3 replicates each (1, 2, 3).

Load a dataset from the current directory, inferring the samples automatically.
>>> from itpseq import DataSet
>>> data = DataSet(data_path='.')
>>> data
DataSet(data_path=PosixPath('.'),
        file_pattern='(?P<lib_type>[^_]+)_(?P<sample>[^_\\d]+)(?P<replicate>\\d+)\\.processed\\.json',
        samples=[Sample(nnn15.noa:[1, 2, 3]),
                 Sample(nnn15.tcx:[1, 2, 3], ref: nnn15.noa)],
        )
Same as above, but only use “sample” as key.
>>> data = DataSet(data_path='.', keys=['sample'])
>>> data
DataSet(data_path=PosixPath('.'),
        file_pattern='(?P<lib_type>[^_]+)_(?P<sample>[^_\\d]+)(?P<replicate>\\d+)\\.processed\\.json',
        samples=[Sample(noa:[1, 2, 3]),
                 Sample(tcx:[1, 2, 3], ref: noa)],
        )
Compute a standard report and export it as PDF
>>> data.report('my_experiment.pdf')
Display a graph of the inverse-toeprints lengths for each sample
>>> data.itp_len_plot(row='sample')
Attributes:
samples_with_ref

Methods

DE([pos])

Computes the log2-FoldChange for each motif described by pos for each sample in the DataSet relative to their reference

infos([html])

Displays summary information about the dataset NGS reads per replicate.

itoeprint

itp_len_plot

reorder_samples

report

__init__(data_path: Path = '.', result_path: Path | None = None, samples: dict | None = None, keys=None, ref_labels: str | tuple | None = 'noa', cache_path=None, file_pattern=None, aafile_pattern=None)

Methods

DE([pos])

Computes the log2-FoldChange for each motif described by pos for each sample in the DataSet relative to their reference

__init__([data_path, result_path, samples, ...])

infos([html])

Displays summary information about the dataset NGS reads per replicate.

itoeprint([plot, norm, norm_range, ...])

itp_len_plot([ax, col, row, min_codon, ...])

reorder_samples(order[, validate, ...])

report([template, output])

Attributes

samples_with_ref

toeprint_df