pytransit package¶
Subpackages¶
- pytransit.analysis package
- Submodules
- pytransit.analysis.base module
- pytransit.analysis.binomial module
- pytransit.analysis.example module
- pytransit.analysis.griffin module
- pytransit.analysis.gumbel module
- pytransit.analysis.hmm module
- pytransit.analysis.rankproduct module
- pytransit.analysis.resampling module
- pytransit.analysis.tn5gaps module
- pytransit.analysis.anova module
- Module contents
Submodules¶
pytransit.draw_trash module¶
-
pytransit.draw_trash.
draw_canvas
(fulldata, position, hash, orf2data, feature_hashes, feature_data, labels=[], min_read=0, scale=[500], globalScale=False, start=1, end=500, canvas_h=-1, canvas_w=1000)[source]¶
-
pytransit.draw_trash.
draw_features
(draw, GENES, orf2data, start, end, start_x, start_y, width, height)[source]¶
-
pytransit.draw_trash.
draw_genes
(draw, GENES, orf2data, start, end, start_x, start_y, width, height, doTriangle=True)[source]¶
-
pytransit.draw_trash.
draw_reads
(draw, reads, ta_sites, start_x=0, start_y=0, width=400, height=100, start=0, end=500, min_read=0, max_read=500, lwd=2)[source]¶
pytransit.fileDisplay module¶
-
class
pytransit.fileDisplay.
SortableListCtrl
(parent, ID=-1, pos=wx.Point(-1, -1), size=wx.Size(-1, -1), style=0)[source]¶ Bases:
wx._core.ListCtrl
-
class
pytransit.fileDisplay.
TransitGridFrame
(parent, path, size=(-1, -1))[source]¶ Bases:
wx._core.Frame
-
class
pytransit.fileDisplay.
TransitTable
(data, colnames)[source]¶ Bases:
wx._grid.GridTableBase
A custom wx.Grid Table using user supplied data
-
GetValue
(row, col) → PyObject[source]¶ Must be overridden to implement accessing the table values as text.
-
pytransit.images module¶
pytransit.norm_tools module¶
-
class
pytransit.norm_tools.
AdaptiveBGCNorm
[source]¶ Bases:
pytransit.norm_tools.NormMethod
-
name
= 'aBGC'¶
-
static
normalize
(data, wigList=[], annotationPath='', doTotReads=True, bgsamples=200000)[source]¶ Returns the normalized data using the aBGC method.
- Parameters
data (numpy array) – (K,N) numpy array defining read-counts at N sites for K datasets.
doTotReads (bool) – Boolean specifying whether to do TTR normalization as well.
bgsamples (int) – Integeer specifying how many samples to take.
- Returns
Array with the normalized data.
- Return type
numpy array
- Example
>>> import pytransit.norm_tools as norm_tools >>> import pytransit.tnseq_tools as tnseq_tools >>> (data, position) = tnseq_tools.get_data(["transit/data/glycerol_H37Rv_rep1.wig", "transit/data/glycerol_H37Rv_rep2.wig"]) >>> print(data) array([[ 0., 0., 0., ..., 0., 0., 0.], [ 0., 0., 0., ..., 0., 0., 0.]]) >>> normdata = norm_tools.aBGC_norm(data) >>> print(normdata) array([[ 0., 0., 0., ..., 0., 0., 0.], [ 0., 0., 0., ..., 0., 0., 0.]])
See also
-
-
class
pytransit.norm_tools.
BetaGeomNorm
[source]¶ Bases:
pytransit.norm_tools.NormMethod
-
name
= 'betageom'¶
-
static
normalize
(data, wigList=[], annotationPath='', doTTR=True, bgsamples=200000)[source]¶ Returns normalized data according to the BGC method.
- Parameters
data (numpy array) – (K,N) numpy array defining read-counts at N sites for K datasets.
doTTR (bool) – Boolean specifying whether to do TTR norm as well.
bgsamples (int) – Integer specifying how many samples to take.
- Returns
Array with the data normalized using the betageom method.
- Return type
numpy array
- Example
>>> import pytransit.norm_tools as norm_tools >>> import pytransit.tnseq_tools as tnseq_tools >>> (data, position) = tnseq_tools.get_data(["transit/data/glycerol_H37Rv_rep1.wig", "transit/data/glycerol_H37Rv_rep2.wig"]) >>> print(data) array([[ 0., 0., 0., ..., 0., 0., 0.], [ 0., 0., 0., ..., 0., 0., 0.]]) >>> normdata = norm_tools.betageom_norm(data) >>> print(normdata) [[ 0. 0. 0. ..., 0. 0. 0.] [ 0. 0. 0. ..., 0. 0. 0.]]
See also
-
-
class
pytransit.norm_tools.
EmpHistNorm
[source]¶ Bases:
pytransit.norm_tools.NormMethod
-
name
= 'emphist'¶
-
static
normalize
(data, wigList=[], annotationPath='')[source]¶ Returns the normalized data, using the empirical hist method.
- Parameters
wigList (list) – List of paths to wig formatted datasets.
annotationPath (str) – Path to annotation in .prot_table or GFF3 format.
- Returns
Array with the normalization factors for the emphist method.
- Return type
numpy array
- Example
>>> import pytransit.norm_tools as norm_tools >>> import pytransit.tnseq_tools as tnseq_tools >>> (data, position) = tnseq_tools.get_data(["transit/data/glycerol_H37Rv_rep1.wig", "transit/data/glycerol_H37Rv_rep2.wig"]) >>> print(data) array([[ 0., 0., 0., ..., 0., 0., 0.], [ 0., 0., 0., ..., 0., 0., 0.]]) >>> factors = norm_tools.emphist_factors(["transit/data/glycerol_H37Rv_rep1.wig", "transit/data/glycerol_H37Rv_rep2.wig"], "transit/genomes/H37Rv.prot_table") >>> print(factors) array([[ 1. ], [ 0.63464722]])
See also
-
-
pytransit.norm_tools.
Fzinfnb
(params, args)[source]¶ Objective function for the zero-inflated NB method.
-
class
pytransit.norm_tools.
NZMeanNorm
[source]¶ Bases:
pytransit.norm_tools.NormMethod
-
name
= 'nzmean'¶
-
static
normalize
(data, wigList=[], annotationPath='')[source]¶ Returns the normalization factors for the data, using the NZMean method.
- Parameters
data (numpy array) – (K,N) numpy array defining read-counts at N sites for K datasets.
- Returns
Array with the normalization factors for the nzmean method.
- Return type
numpy array
- Example
>>> import pytransit._tools.norm_tools as norm_tools >>> import pytransit.tnseq_tools as tnseq_tools >>> (data, position) = tnseq_tools.get_data(["transit/data/glycerol_H37Rv_rep1.wig", "transit/data/glycerol_H37Rv_rep2.wig"]) >>> print(data) array([[ 0., 0., 0., ..., 0., 0., 0.], [ 0., 0., 0., ..., 0., 0., 0.]]) >>> factors = norm_tools.nzmean_factors(data) >>> print(factors) array([[ 1.14836149], [ 0.88558737]])
See also
-
-
class
pytransit.norm_tools.
NoNorm
[source]¶ Bases:
pytransit.norm_tools.NormMethod
-
name
= 'nonorm'¶
-
-
class
pytransit.norm_tools.
QuantileNorm
[source]¶ Bases:
pytransit.norm_tools.NormMethod
-
name
= 'quantile'¶
-
static
normalize
(data, wigList=[], annotationPath='')[source]¶ Performs Quantile Normalization as described by Bolstad et al. 2003
- Parameters
data (numpy array) – (K,N) numpy array defining read-counts at N sites for K datasets.
- Returns
Array with the data normalized by the quantile normalization method.
- Return type
numpy array
- Example
>>> import pytransit.norm_tools as norm_tools >>> import pytransit.tnseq_tools as tnseq_tools >>> (data, position) = tnseq_tools.get_data(["transit/data/glycerol_H37Rv_rep1.wig", "transit/data/glycerol_H37Rv_rep2.wig"]) >>> print(data) array([[ 0., 0., 0., ..., 0., 0., 0.], [ 0., 0., 0., ..., 0., 0., 0.]]) >>> normdata = norm_tools.quantile_norm(data) >>> print(normdata)
See also
-
-
class
pytransit.norm_tools.
TTRNorm
[source]¶ Bases:
pytransit.norm_tools.NormMethod
-
empirical_theta
()[source]¶ Calculates the observed density of the data.
This is used as an estimate insertion density by some normalization methods. May be improved by more sophisticated ways later on.
- Parameters
data (numpy array) –
numpy array defining read-counts at N sites.
- Returns
Density of the given dataset.
- Return type
float
- Example
>>> import pytransit.tnseq_tools as tnseq_tools >>> import pytransit.norm_tools as norm_tools >>> (data, position) = tnseq_tools.get_data(["transit/data/glycerol_H37Rv_rep1.wig", "transit/data/glycerol_H37Rv_rep2.wig"]) >>> print(data) array([[ 0., 0., 0., ..., 0., 0., 0.], [ 0., 0., 0., ..., 0., 0., 0.]]) >>> theta = norm_tools.empirical_theta(data) >>> print(theta) 0.467133570136
See also
TTR_factors
-
name
= 'emphist'¶
-
static
normalize
(data, wigList=[], annotationPath='', thetaEst=<function TTRNorm.empirical_theta>, muEst=<function TTRNorm.trimmed_empirical_mu>, target=100.0)[source]¶ Returns the normalization factors for the data, using the TTR method.
- Parameters
data (numpy array) – (K,N) numpy array defining read-counts at N sites for K datasets.
thetaEst (function) – Function used to estimate density. Should take a list of counts as input.
muEst (function) – Function used to estimate mean count. Should take a list of counts as input.
- Returns
Array with the normalization factors for the TTR method.
- Return type
numpy array
- Example
>>> import pytransit.norm_tools as norm_tools >>> import pytransit.tnseq_tools as tnseq_tools >>> (data, position) = tnseq_tools.get_data(["transit/data/glycerol_H37Rv_rep1.wig", "transit/data/glycerol_H37Rv_rep2.wig"]) >>> print(data) array([[ 0., 0., 0., ..., 0., 0., 0.], [ 0., 0., 0., ..., 0., 0., 0.]]) >>> factors = norm_tools.TTR_factors(data) >>> print(factors) array([[ 1. ], [ 0.62862886]])
See also
-
trimmed_empirical_mu
(t=0.05)[source]¶ Estimates the trimmed mean of the data.
This is used as an estimate of mean count by some normalization methods. May be improved by more sophisticated ways later on.
- Parameters
data (numpy array) –
numpy array defining read-counts at N sites.
t (float) – Float specifying fraction of start and end to trim.
- Returns
(Trimmed) Mean of the given dataset.
- Return type
float
- Example
>>> import pytransit.tnseq_tools as tnseq_tools >>> import pytransit.norm_tools as norm_tools >>> (data, position) = tnseq_tools.get_data(["transit/data/glycerol_H37Rv_rep1.wig", "transit/data/glycerol_H37Rv_rep2.wig"]) >>> print(data) array([[ 0., 0., 0., ..., 0., 0., 0.], [ 0., 0., 0., ..., 0., 0., 0.]]) >>> mu = norm_tools.trimmed_empirical_mu(data) >>> print(mu) 120.73077107
See also
TTR_factors
-
-
class
pytransit.norm_tools.
TotReadsNorm
[source]¶ Bases:
pytransit.norm_tools.NormMethod
-
name
= 'totreads'¶
-
static
normalize
(data, wigList=[], annotationPath='')[source]¶ Returns the normalization factors for the data, using the total reads method.
- Parameters
data (numpy array) – (K,N) numpy array defining read-counts at N sites for K datasets.
- Returns
Array with the normalization factors for the totreads method.
- Return type
numpy array
- Example
>>> import pytransit.norm_tools as norm_tools >>> import pytransit.tnseq_tools as tnseq_tools >>> (data, position) = tnseq_tools.get_data(["transit/data/glycerol_H37Rv_rep1.wig", "transit/data/glycerol_H37Rv_rep2.wig"]) >>> print(data) array([[ 0., 0., 0., ..., 0., 0., 0.], [ 0., 0., 0., ..., 0., 0., 0.]]) >>> factors = norm_tools.totreads_factors(data) >>> print(factors) array([[ 1.2988762], [ 0.8129396]])
See also
-
-
class
pytransit.norm_tools.
ZeroInflatedNBNorm
[source]¶ Bases:
pytransit.norm_tools.NormMethod
-
name
= 'zinfb'¶
-
static
normalize
(data, wigList=[], annotationPath='')[source]¶ Returns the normalization factors for the data using the zero-inflated negative binomial method.
- Parameters
data (numpy array) – (K,N) numpy array defining read-counts at N sites for K datasets.
- Returns
Array with the normalization factors for the zinfnb method.
- Return type
numpy array
- Example
>>> import pytransit.norm_tools as norm_tools >>> import pytransit.tnseq_tools as tnseq_tools >>> (data, position) = tnseq_tools.get_data(["transit/data/glycerol_H37Rv_rep1.wig", "transit/data/glycerol_H37Rv_rep2.wig"]) >>> print(data) array([[ 0., 0., 0., ..., 0., 0., 0.], [ 0., 0., 0., ..., 0., 0., 0.]]) >>> factors = norm_tools.zinfnb_factors(data) >>> print(factors) [[ 0.0121883 ] [ 0.00747111]]
See also
-
-
pytransit.norm_tools.
cleaninfgeom
(x, rho)[source]¶ Returns a ‘clean’ output from the geometric distribution.
-
pytransit.norm_tools.
empirical_theta
(X)[source]¶ Calculates the observed density of the data.
This is used as an estimate insertion density by some normalization methods. May be improved by more sophisticated ways later on.
- Parameters
data (numpy array) –
numpy array defining read-counts at N sites.
- Returns
Density of the given dataset.
- Return type
float
- Example
>>> import pytransit.tnseq_tools as tnseq_tools >>> import pytransit.norm_tools as norm_tools >>> (data, position) = tnseq_tools.get_data(["transit/data/glycerol_H37Rv_rep1.wig", "transit/data/glycerol_H37Rv_rep2.wig"]) >>> print(data) array([[ 0., 0., 0., ..., 0., 0., 0.], [ 0., 0., 0., ..., 0., 0., 0.]]) >>> theta = norm_tools.empirical_theta(data) >>> print(theta) 0.467133570136
See also
TTR_factors
-
pytransit.norm_tools.
norm_to_target
(data, target)[source]¶ Returns factors to normalize the data to the given target value.
- Parameters
data (numpy array) – (K,N) numpy array defining read-counts at N sites for K datasets.
target (float) – Floating point specifying the target for the mean of the data/
- Returns
Array with the factors necessary to normalize mean to target.
- Return type
numpy array
- Example
>>> import pytransit.norm_tools as norm_tools >>> import pytransit.tnseq_tools as tnseq_tools >>> (data, position) = tnseq_tools.get_data(["transit/data/glycerol_H37Rv_rep1.wig", "transit/data/glycerol_H37Rv_rep2.wig"]) >>> print(data) array([[ 0., 0., 0., ..., 0., 0., 0.], [ 0., 0., 0., ..., 0., 0., 0.]]) >>> factors = norm_tools.norm_to_target(data, 100) >>> print(factors) [[ 1.8548104 ] [ 1.16088726]]
See also
-
pytransit.norm_tools.
normalize_data
(data, method='nonorm', wigList=[], annotationPath='')[source]¶ Normalizes the numpy array by the given normalization method.
- Parameters
data (numpy array) – (K,N) numpy array defining read-counts at N sites for K datasets.
method (str) – Name of the desired normalization method.
wigList (list) – List of paths for the desired wig-formatted datasets.
annotationPath (str) – Path to the prot_table annotation file.
- Returns
Array with the normalized data. list: List containing the normalization factors. Empty if not used.
- Return type
numpy array
- Example
>>> import pytransit.norm_tools as norm_tools >>> import pytransit.tnseq_tools as tnseq_tools >>> (data, position) = tnseq_tools.get_data(["transit/data/glycerol_H37Rv_rep1.wig", "transit/data/glycerol_H37Rv_rep2.wig"]) >>> print(data) array([[ 0., 0., 0., ..., 0., 0., 0.], [ 0., 0., 0., ..., 0., 0., 0.]]) (normdata, normfactors) = norm_tools.normalize_data(data, "TTR") # Some methods require annotation and path to wig files. >>> print(normfactors) array([[ 1. ], [ 0.62862886]]) >> print(normdata) array([[ 0., 0., 0., ..., 0., 0., 0.], [ 0., 0., 0., ..., 0., 0., 0.]])
Note
Some normalization methods require the wigList and annotationPath arguments.
-
pytransit.norm_tools.
trimmed_empirical_mu
(X, t=0.05)[source]¶ Estimates the trimmed mean of the data.
This is used as an estimate of mean count by some normalization methods. May be improved by more sophisticated ways later on.
- Parameters
data (numpy array) –
numpy array defining read-counts at N sites.
t (float) – Float specifying fraction of start and end to trim.
- Returns
(Trimmed) Mean of the given dataset.
- Return type
float
- Example
>>> import pytransit.tnseq_tools as tnseq_tools >>> import pytransit.norm_tools as norm_tools >>> (data, position) = tnseq_tools.get_data(["transit/data/glycerol_H37Rv_rep1.wig", "transit/data/glycerol_H37Rv_rep2.wig"]) >>> print(data) array([[ 0., 0., 0., ..., 0., 0., 0.], [ 0., 0., 0., ..., 0., 0., 0.]]) >>> mu = norm_tools.trimmed_empirical_mu(data) >>> print(mu) 120.73077107
See also
TTR_factors
-
pytransit.norm_tools.
zinfnb_factors
(data)[source]¶ Returns the normalization factors for the data using the zero-inflated negative binomial method.
- Parameters
data (numpy array) – (K,N) numpy array defining read-counts at N sites for K datasets.
- Returns
Array with the normalization factors for the zinfnb method.
- Return type
numpy array
- Example
>>> import pytransit.norm_tools as norm_tools >>> import pytransit.tnseq_tools as tnseq_tools >>> (data, position) = tnseq_tools.get_data(["transit/data/glycerol_H37Rv_rep1.wig", "transit/data/glycerol_H37Rv_rep2.wig"]) >>> print(data) array([[ 0., 0., 0., ..., 0., 0., 0.], [ 0., 0., 0., ..., 0., 0., 0.]]) >>> factors = norm_tools.zinfnb_factors(data) >>> print(factors) [[ 0.0121883 ] [ 0.00747111]]
See also
pytransit.qcDisplay module¶
pytransit.stat_tools module¶
-
pytransit.stat_tools.
BH_fdr_correction
(X)[source]¶ Adjusts p-values using the Benjamini Hochberg procedure
-
pytransit.stat_tools.
bayesian_ess_thresholds
(Z_raw, ALPHA=0.05)[source]¶ Returns Essentiality Thresholds using a BH-like procedure
-
pytransit.stat_tools.
binom
(k, n, p)[source]¶ Binomial distribution. Uses Normal approximation for large ‘n’
-
pytransit.stat_tools.
binom_test
(k, n, p, type='two-sided')[source]¶ Does a binomial test given success, trials and probability.
-
pytransit.stat_tools.
boxcoxTable
(X, minlambda, maxlambda, dellambda)[source]¶ Returns a table of (loglik function, lambda) pairs for the data.
-
pytransit.stat_tools.
boxcoxtransform
(x, lambdax)[source]¶ Performs a box-cox transformation to data vector X. WARNING: elements of X should be all positive! Fixed: ‘>’ has changed to ‘<’
-
pytransit.stat_tools.
loglik
(X, lambdax)[source]¶ Computes the log-likelihood function for a transformed vector Xtransform.
-
pytransit.stat_tools.
phi_coefficient
(X, Y)[source]¶ Calculates the phi-coefficient for two bool arrays
-
pytransit.stat_tools.
resampling
(data1, data2, S=10000, testFunc=<function F_mean_diff_flat>, permFunc=<function F_shuffle_flat>, adaptive=False, lib_str1='', lib_str2='')[source]¶ Does a permutation test on two sets of data.
Performs the resampling / permutation test given two sets of data using a function defining the test statistic and a function defining how to permute the data.
- Parameters
ar – List or numpy array with the first set of observations.
data2 – List or numpy array with the second set of observations.
S – Number of permutation tests (or samples) to obtain.
testFunc – Function defining the desired test statistic. Should accept two lists as arguments. Default is difference in means between the observations.
permFunc – Function defining the way to permute the data. Should accept one argument, the combined set of data. Default is random shuffle.
adaptive – Cuts-off resampling early depending on significance.
- Returns
- Tuple with described values
test_obs – Test statistic of observation.
mean1 – Arithmetic mean of first set of data.
mean2 – Arithmetic mean of second set of data.
log2FC – Normalized log2FC the means.
pval_ltail – Lower tail p-value.
pval_utail – Upper tail p-value.
pval_2tail – Two-tailed p-value.
test_sample – List of samples of the test statistic.
- Example
>>> import pytransit.stat_tools as stat_tools >>> import numpy >>> X = numpy.random.random(100) >>> Y = numpy.random.random(100) >>> (test_obs, mean1, mean2, log2fc, pval_ltail, pval_utail, pval_2tail, test_sample) = stat_tools.resampling(X,Y) >>> pval_2tail 0.2167 >>> test_sample[:3] [0.076213992904990535, -0.0052513291091412784, -0.0038425140184765172]
pytransit.tnseq_tools module¶
-
pytransit.tnseq_tools.
ExpectedRuns
(n, pnon)[source]¶ Expected value of the run of non=insertions (Schilling, 1990):
ER_n = log(1/p)(nq) + gamma/ln(1/p) -1/2 + r1(n) + E1(n)
- Parameters
n (int) – Integer representing the number of sites.
pins (float) – Floating point number representing the probability of non-insertion.
- Returns
Size of the expected maximum run.
- Return type
float
-
class
pytransit.tnseq_tools.
Gene
(orf, name, desc, reads, position, start=0, end=0, strand='')[source]¶ Bases:
object
Class defining a gene with useful attributes for TnSeq analysis.
This class helps define a “gene” with attributes that facilitate TnSeq analysis. Here “gene” can be defined to be any genomic region. The Genes class (with an s) can be used to define list of Gene objects with more useful operations on the “genome” level.
-
orf
¶ A string defining the ID of the gene.
-
name
¶ A string with the human readable name of the gene.
-
desc
¶ A string with the description of the gene.
-
reads
¶ List of lists of read-counts in possible site replicate dataset.
-
position
¶ List of coordinates of the possible sites.
-
start
¶ An integer defining the start coordinate for the gene.
-
end
¶ An integer defining the end coordinate for the gene.
-
strand
¶ A string defining the strand of the gene.
- Example
>>> import pytransit.tnseq_tools as tnseq_tools >>> G = tnseq_tools.Gene("Rv0001", "dnaA", "DNA Replication A", [[0,0,0,0,1,3,0,1]], [1,21,32,37,45,58,66,130], strand="+" ) >>> print(G) Rv0001 (dnaA) k=3 n=8 r=4 theta=0.37500 >>> print(G.phi()) 0.625 >>> print(G.tosses) array([ 0., 0., 0., 0., 1., 1., 0., 1.])
See also
-
__eq__
(other)[source]¶ Compares against other gene object.
- Returns
True if the gene objects have same orf id.
- Return type
bool
-
__getitem__
(i)[source]¶ Return read-counts at position i.
- Parameters
i (int) – integer of the index of the desired site.
- Returns
Reads at position i.
- Return type
list
-
__lt__
(other)[source]¶ Compares against other gene object.
- Returns
True if the gene object id is less than the other.
- Return type
bool
-
__str__
()[source]¶ Return a string representation of the object.
- Returns
Human readable string with some of the attributes.
- Return type
str
-
get_gap_span
()[source]¶ Returns the span of the maxrun of the gene (i.e. number of nucleotides).
- Returns
Number of nucleotides spanned by the max run.
- Return type
int
-
get_gene_span
()[source]¶ Returns the number of nucleotides spanned by the gene.
- Returns
Number of nucleotides spanned by the gene’s sites.
- Return type
int
-
phi
()[source]¶ Return the non-insertion density (“phi”) for the gene.
- Returns
Non-insertion density (i.e. 1 - theta)
- Return type
float
-
-
class
pytransit.tnseq_tools.
Genes
(wigList, annotation, norm='nonorm', reps='All', minread=1, ignoreCodon=True, nterm=0.0, cterm=0.0, include_nc=False, data=[], position=[], genome='', transposon='himar1')[source]¶ Bases:
object
Class defining a list of Gene objects with useful attributes for TnSeq analysis.
This class helps define a list of Gene objects with attributes that facilitate TnSeq analysis. Includes methods that calculate useful statistics and even rudamentary analysis of essentiality.
-
wigList
¶ List of paths to datasets in .wig format.
-
protTable
¶ String with path to annotation in .prot_table format.
-
norm
¶ String with the normalization used/
-
reps
¶ String with information on how replicates were handled.
-
minread
¶ Integer with the minimum magnitude of read-count considered.
-
ignoreCodon
¶ Boolean defining whether to ignore the start/stop codon.
-
nterm
¶ Float number of the fraction of the N-terminus to ignore.
-
cterm
¶ Float number of the fraction of the C-terminus to ignore.
-
include_nc
¶ Boolean determining whether to include non-coding areas.
-
orf2index
¶ Dictionary of orf id to index in the genes list.
-
genes
¶ List of the Gene objects.
- Example
>>> import pytransit.tnseq_tools as tnseq_tools >>> G = tnseq_tools.Genes(["transit/data/glycerol_H37Rv_rep1.wig", "transit/data/glycerol_H37Rv_rep2.wig"], "transit/genomes/H37Rv.prot_table", norm="TTR") >>> print(G) Genes Object (N=3990) >>> print(G.global_theta()) 0.40853707222816626 >>> print(G["Rv0001"] # Lookup like dictionary) Rv0001 (dnaA) k=0 n=31 r=31 theta=0.00000 >>> print(G[2] # Lookup like list) Rv0003 (recF) k=5 n=35 r=14 theta=0.14286 >>> print(G[2].reads) [[ 62. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 63. 0. 0. 13. 46. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. ] [ 3.14314432 67.26328843 0. 0. 0. 0. 0. 0. 0. 35.20321637 0. 0. 0. 0. 30.80281433 0. 101.20924707 0. 23.25926796 0. 16.97297932 8.17217523 0. 0. 2.51451546 3.77177318 0.62862886 0. 0. 69.14917502 0. 0. 0. 0. 0. ]]
See also
-
__contains__
(item)[source]¶ Defines __contains__ to check if gene exists in the list.
- Parameters
item (str) – String with the id of the gene.
- Returns
Boolean with True if item is in the list.
- Return type
bool
-
__getitem__
(i)[source]¶ Defines __getitem__ method so that it works as dictionary and list.
- Parameters
i (int) – Integer or string defining index or orf ID desired.
- Returns
A gene with the index or ID equal to i.
- Return type
-
__len__
()[source]¶ Defines __len__ returning number of genes.
- Returns
Number of genes in the list.
- Return type
int
-
__str__
()[source]¶ Defines __str__ to print(a generic str with the size of the list.)
- Returns
Human readable string with number of genes in object.
- Return type
str
-
global_insertion
()[source]¶ Returns total number of insertions, i.e. sum of ‘k’ over all genes.
- Returns
Total sum of reads across all genes.
- Return type
float
-
global_phi
()[source]¶ Returns global non-insertion frequency, of the library.
- Returns
Complement of global theta i.e. 1.0-theta
- Return type
float
-
global_reads
()[source]¶ Returns the reads among the library.
- Returns
List of all the data.
- Return type
list
-
global_run
()[source]¶ Returns the run assuming all genes were concatenated together.
- Returns
Max run across all genes.
- Return type
int
-
global_sites
()[source]¶ Returns total number of sites, i.e. sum of ‘n’ over all genes.
- Returns
Total number of sites across all genes.
- Return type
int
-
global_theta
()[source]¶ Returns global insertion frequency, of the library.
- Returns
Total sites with insertions divided by total sites.
- Return type
float
-
local_gap_span
()[source]¶ Returns numpy array with the span of nucleotides of the largest gap, ‘s’, for each gene.
- Returns
Numpy array with the span of gap for all genes.
- Return type
narray
-
local_gene_span
()[source]¶ Returns numpy array with the span of nucleotides of the gene, ‘t’, for each gene.
- Returns
Numpy array with the span of gene for all genes.
- Return type
narray
-
local_insertions
()[source]¶ Returns numpy array with the number of insertions, ‘k’, for each gene.
- Returns
Numpy array with the number of insertions for all genes.
- Return type
narray
-
local_phis
()[source]¶ Returns numpy array of non-insertion frequency, ‘phi’, for each gene.
- Returns
Numpy array with the complement of density for all genes.
- Return type
narray
-
local_reads
()[source]¶ Returns numpy array of lists containing the read counts for each gene.
- Returns
Numpy array with the list of reads for all genes.
- Return type
narray
-
local_runs
()[source]¶ Returns numpy array with maximum run of non-insertions, ‘r’, for each gene.
- Returns
Numpy array with the max run of non-insertions for all genes.
- Return type
narray
-
local_sites
()[source]¶ Returns numpy array with total number of TA sites, ‘n’, for each gene.
- Returns
Numpy array with the number of sites for all genes.
- Return type
narray
-
local_thetas
()[source]¶ Returns numpy array of insertion frequencies, ‘theta’, for each gene.
- Returns
Numpy array with the density for all genes.
- Return type
narray
-
-
pytransit.tnseq_tools.
GumbelCDF
(x, u, B)[source]¶ CDF of the Gumbel distribution:
e^(-e^( (u-x)/B))
- Parameters
x (int) – Length of the max run.
u (float) – Location parameter of the Gumbel dist.
B (float) – Scale parameter of the Gumbel dist.
- Returns
Cumulative probability o the Gumbel distribution.
- Return type
float
-
pytransit.tnseq_tools.
VarR
(n, pnon)[source]¶ Variance of the expected run of non-insertons (Schilling, 1990):
\[VarR_n = (pi^2)/(6*ln(1/p)^2) + 1/12 + r2(n) + E2(n)\]- Parameters
n (int) – Integer representing the number of sites.
pnon (float) – Floating point number representing the probability of non-insertion.
- Returns
Variance of the length of the maximum run.
- Return type
float
-
pytransit.tnseq_tools.
check_wig_includes_zeros
(wig_list)[source]¶ Returns boolean list showing whether the given files include empty sites (zero) or not.
- Parameters
wig_list (list) – List of paths to wig files.
- Returns
List of boolean values.
- Return type
list
-
pytransit.tnseq_tools.
combine_replicates
(data, method='Sum')[source]¶ Returns list of data merged together.
- Parameters
data (list) – List of numeric (replicate) data to be merged.
method (str) – How to combine the replicate dataset.
- Returns
List of numeric dataset now merged together.
- Return type
list
-
pytransit.tnseq_tools.
get_coordinate_map
(galign_path, reverse=False)[source]¶ Attempts to get mapping of coordinates from galign file.
- Parameters
path (str) – Path to .galign file.
reverse (bool) – Boolean specifying whether to do A to B or B to A.
- Returns
Dictionary of coordinate in one file to another file.
- Return type
dict
-
pytransit.tnseq_tools.
get_data
(wig_list)[source]¶ - Returns a tuple of (data, position) containing a matrix of raw read-counts
, and list of coordinates.
- Parameters
wig_list (list) – List of paths to wig files.
- Returns
Two lists containing data and positions of the wig files given.
- Return type
tuple
- Example
>>> import pytransit.tnseq_tools as tnseq_tools >>> (data, position) = tnseq_tools.get_data(["data/glycerol_H37Rv_rep1.wig", "data/glycerol_H37Rv_rep2.wig"]) >>> print(data) array([[ 0., 0., 0., ..., 0., 0., 0.], [ 0., 0., 0., ..., 0., 0., 0.]])
-
pytransit.tnseq_tools.
get_data_zero_fill
(wig_list)[source]¶ - Returns a tuple of (data, position) containing a matrix of raw read counts,
and list of coordinates. Positions that are missing are filled in as zero.
- Parameters
wig_list (list) – List of paths to wig files.
- Returns
Two lists containing data and positions of the wig files given.
- Return type
tuple
-
pytransit.tnseq_tools.
get_file_types
(wig_list)[source]¶ Returns the transposon type (himar1/tn5) of the list of wig files.
- Parameters
wig_list (list) – List of paths to wig files.
- Returns
List of transposon type (“himar1” or “tn5”).
- Return type
list
-
pytransit.tnseq_tools.
get_gene_info
(path)[source]¶ Returns a dictionary that maps gene id to gene information.
- Parameters
path (str) – Path to annotation in .prot_table or GFF3 format.
- Returns
- Dictionary of gene id to tuple of information:
name
description
start coordinate
end coordinate
strand
- Return type
dict
-
pytransit.tnseq_tools.
get_gene_info_gff
(path)[source]¶ Returns a dictionary that maps gene id to gene information.
- Parameters
path (str) – Path to annotation in GFF3 format.
- Returns
- Dictionary of gene id to tuple of information:
name
description
start coordinate
end coordinate
strand
- Return type
dict
-
pytransit.tnseq_tools.
get_gene_info_pt
(path)[source]¶ Returns a dictionary that maps gene id to gene information.
- Parameters
path (str) – Path to annotation in .prot_table format.
- Returns
- Dictionary of gene id to tuple of information:
name
description
start coordinate
end coordinate
strand
- Return type
dict
-
pytransit.tnseq_tools.
get_genes_in_range
(pos_hash, start, end)[source]¶ Returns list of genes that occur in a given range of coordinates.
- Parameters
pos_hash (dict) – Dictionary of position to list of genes.
start (int) – Start coordinate of the desired range.
end (int) – End coordinate of the desired range.
- Returns
List of genes that fall within range.
- Return type
list
-
pytransit.tnseq_tools.
get_pos_hash
(path)[source]¶ Returns a dictionary that maps coordinates to a list of genes that occur at that coordinate.
- Parameters
path (str) – Path to annotation in .prot_table or GFF3 format.
- Returns
Dictionary of position to list of genes that share that position.
- Return type
dict
-
pytransit.tnseq_tools.
get_pos_hash_gff
(path)[source]¶ Returns a dictionary that maps coordinates to a list of genes that occur at that coordinate.
- Parameters
path (str) – Path to annotation in GFF3 format.
- Returns
Dictionary of position to list of genes that share that position.
- Return type
dict
-
pytransit.tnseq_tools.
get_pos_hash_pt
(path)[source]¶ Returns a dictionary that maps coordinates to a list of genes that occur at that coordinate.
- Parameters
path (str) – Path to annotation in .prot_table format.
- Returns
Dictionary of position to list of genes that share that position.
- Return type
dict
-
pytransit.tnseq_tools.
get_wig_stats
(path)[source]¶ Returns statistics for the given wig file with read-counts.
- Parameters
path (str) – String with the path to the wig file of interest.
- Returns
- Tuple with the following statistical measures:
density
mean read
non-zero mean
non-zero median
max read
total reads
skew
kurtosis
- Return type
tuple
-
pytransit.tnseq_tools.
griffin_analysis
(genes_obj, pins)[source]¶ Implements the basic Gumbel analysis of runs of non-insertion, described in Griffin et al. 2011.
This analysis method calculates a p-value of observing the maximun run of TA sites without insertions in a row (i.e. a “run”, r). Unusually long runs are indicative of an essential gene or protein domain. Assumes that there is a constant, global probability of observing an insertion (tantamount to a Bernoulli probability of success).
- Parameters
genes_obj (Genes) – An object of the Genes class defining the genes.
pins (float) – The probability of insertion.
- Returns
- List of lists with results and information for the genes. The elements of the list are as follows:
ORF ID.
Gene Name.
Gene Description.
Number of TA sites with insertions.
Number of TA sites.
Length of largest run of non-insertion.
Expected run for a gene this size.
p-value of the observed run.
- Return type
list
-
pytransit.tnseq_tools.
maxrun
(lst, item=0)[source]¶ Returns the length of the maximum run an item in a given list.
- Parameters
lst (list) – List of numeric items.
item (float) – Number to look for consecutive runs of.
- Returns
Length of the maximum run of consecutive instances of item.
- Return type
int
-
pytransit.tnseq_tools.
read_combined_wig
(fname)[source]¶ Read the combined wig-file generated by Transit :: Filename -> Tuple([Site], [WigData], [Filename]) Site :: Integer WigData :: [Number] Filename :: String
-
pytransit.tnseq_tools.
read_genes
(fname, descriptions=False)[source]¶ (Filename, Options) -> [Gene] Gene :: {start, end, rv, gene, strand}
-
pytransit.tnseq_tools.
read_genome
(path)[source]¶ Reads in FASTA formatted genome file.
- Parameters
path (str) – Path to .galign file.
- Returns
String with the genomic sequence.
- Return type
string
-
pytransit.tnseq_tools.
read_samples_metadata
(metadata_file, covarsToRead=[], interactionsToRead=[], condition_name='Condition')[source]¶ Filename -> ConditionMap ConditionMap :: {Filename: Condition}, [{Filename: Covar}], [{Filename: Interaction}] Condition :: String Covar :: String Interaction :: String
-
pytransit.tnseq_tools.
runindex
(runs)[source]¶ Returns a list of the indexes of the start of the runs; complements runs().
- Parameters
runs (list) – List of numeric data.
- Returns
List of the index of the runs of non-insertions. Non-zero sites are treated as runs of zero.
- Return type
list
-
pytransit.tnseq_tools.
runs
(data)[source]¶ Return list of all the runs of consecutive non-insertions.
- Parameters
data (list) – List of numeric data.
- Returns
List of the length of the runs of non-insertions. Non-zero sites are treated as runs of zero.
- Return type
list
-
pytransit.tnseq_tools.
runs_w_info
(data)[source]¶ Return list of all the runs of consecutive non-insertions with the start and end locations.
- Parameters
data (list) – List of numeric data to check for runs.
- Returns
List of dictionary from run to length and position information of the tun.
- Return type
list
pytransit.transit_gui module¶
-
class
pytransit.transit_gui.
TnSeekFrame
(parent, DEBUG=False)[source]¶
pytransit.transit_tools module¶
-
pytransit.transit_tools.
cleanargs
(rawargs)[source]¶ Returns a list and a dictionary with positional and keyword arguments.
- -This function assumes flags must start with a “-” and and cannot be a
number (but can include them).
- -Flags should either be followed by the value they want to be associated
with (i.e. -p 5) or will be assigned a value of True in the dictionary.
- -The dictionary will map flags to the name given minus ONE “-” sign in
front. If you use TWO minus signs in the flag name (i.e. –verbose), the dictionary key will be the name with ONE minus sign in front (i.e. {“-verbose”:True}).
- Parameters
rawargs (list) – List of positional/keyword arguments. As obtained from sys.argv.
- Returns
- List of positional arguments (i.e. arguments without flags),
in order provided.
- dict: Dictionary mapping flag (key is flag minus the first “-“) and
their values.
- Return type
list
-
pytransit.transit_tools.
convertToCombinedWig
(dataset_list, annotationPath, outputPath, normchoice='nonorm')[source]¶ Normalizes the input datasets and outputs the result in CombinedWig format.
- Parameters
dataset_list (list) – List of paths to datasets in .wig format
annotationPath (str) – Path to annotation in .prot_table or GFF3 format.
outputPath (str) – Desired output path.
normchoice (str) – Choice for normalization method.
-
pytransit.transit_tools.
convertToGeneCountSummary
(dataset_list, annotationPath, outputPath, normchoice='nonorm')[source]¶ Normalizes the input datasets and outputs the result in CombinedWig format.
- Parameters
dataset_list (list) – List of paths to datasets in .wig format
annotationPath (str) – Path to annotation in .prot_table or GFF3 format.
outputPath (str) – Desired output path.
normchoice (str) – Choice for normalization method.
-
pytransit.transit_tools.
convertToIGV
(self, dataset_list, annotationPath, path, normchoice=None)[source]¶
-
pytransit.transit_tools.
get_extended_pos_hash
(path)[source]¶ Returns a dictionary that maps coordinates to a list of genes that occur at that coordinate.
- Parameters
path (str) – Path to annotation in .prot_table or GFF3 format.
- Returns
Dictionary of position to list of genes that share that position.
- Return type
dict
-
pytransit.transit_tools.
get_gene_info
(path)[source]¶ Returns a dictionary that maps gene id to gene information.
- Parameters
path (str) – Path to annotation in .prot_table or GFF3 format.
- Returns
- Dictionary of gene id to tuple of information:
name
description
start coordinate
end coordinate
strand
- Return type
dict
-
pytransit.transit_tools.
get_pos_hash
(path)[source]¶ Returns a dictionary that maps coordinates to a list of genes that occur at that coordinate.
- Parameters
path (str) – Path to annotation in .prot_table or GFF3 format.
- Returns
Dictionary of position to list of genes that share that position.
- Return type
dict
-
pytransit.transit_tools.
get_validated_data
(wig_list, wxobj=None)[source]¶ - Returns a tuple of (data, position) containing a matrix of raw read-counts
, and list of coordinates.
- Parameters
wig_list (list) – List of paths to wig files.
wxobj (object) – wxPython GUI object for warnings
- Returns
Two lists containing data and positions of the wig files given.
- Return type
tuple
- Example
>>> import pytransit.tnseq_tools as tnseq_tools >>> (data, position) = tnseq_tools.get_validated_data(["data/glycerol_H37Rv_rep1.wig", "data/glycerol_H37Rv_rep2.wig"]) >>> print(data) array([[ 0., 0., 0., ..., 0., 0., 0.], [ 0., 0., 0., ..., 0., 0., 0.]])
See also
get_file_types
combine_replicates
get_data_zero_fill
pytransit.norm_tools.normalize_data