mpa.SimulateSort¶
Overview¶
SimulateSort
is a program within the mpathic package which simulates
performing a Sort Seq experiment.
Usage¶
>>> import mpathic
>>> loader = mpathic.io
>>> mp_df = loader.load_model('./mpathic/data/sortseq/full-0/crp_model.txt')
>>> filename = "./mpathic/data/sortseq/full-0/data_small.txt"
>>> df = loader.load_dataset(filename)
>>> mpathic.simulate_sort_class(df=df,mp=mp_df)
Example Input and Output¶
The input table to this function must contain sequence, counts, and energy columns
Example Input Table:
seq ct val
AGGTA 5 -.4
AGTTA 1 -.2
...
Example Output Table:
seq ct val ct_1 ct_2 ct_3 ...
AGGTA 5 -.4 1 2 1
AGTTA 1 -.2 0 1 0
...
The output table will contain all the original columns, along with the sorted columns (ct_1, ct_2 …)
Class Details¶
-
class
mpathic.src.simulate_sort.
SimulateSort
(df=None, mp=None, noisetype='None', npar=[0.2], nbins=3, sequence_library=True, start=0, end=None, chunksize=10)¶ Parameters: - df: (pandas dataframe)
Input data frame.
- mp: (pandas dataframe)
Model data frame.
- noisetype: (string, None)
Noise parameter string indicating what type of
noise to include. Valid choices include None, ‘Normal’, ‘LogNormal’, ‘Plasmid’
- npar: (list)
parameters to go with noisetype. E.g. for
noisetype ‘Normal’, npar must contain the width of the normal distribution
- nbins: (int)
Number of bins that the different variants will get sorted into.
- sequence_library: (bool)
A value of True corresponds to simulating sequencing the library in bin zero
- start: (int)
Position to start analyzed region
- end: (int)
Position to end analyzed region
- chunksize: (int)
This represents the size of chunk the data frame df will be traversed over.
Attributes: - output_df: (pandas data frame)
contains the output of the simulate_sort constructor