Coverage for /home/vivian/gffPandas/gffpandas/gffpandas/gffpandas.py : 100%

Hot-keys on this page
r m x p toggle line displays
j k next/prev highlighted chunk
0 (zero) top of page
1 (one) first highlighted chunk
"""Creating 'Gff3DataFrame' class for bundling data and functionalities together."""
"""Create an instance.""" else:
"""Create a pd dataframe.
By the pandas library the gff3 file is read and a pd dataframe with the given column-names is returned.""" names=["seq_id", "source", "type", "start", "end", "score", "strand", "phase", "attributes"])
"""Create a header.
The header of the gff file is read, means all lines, which start with '#'.""" else:
"""Create a csv or tsv file.
The pd dataframe is saved as a csv file or optional as tsv file.""" else: header=["seq_id", "source", "type", "start", "end", "score", "strand", "phase", "attributes"])
"""Create a gff3 file.
The pd dataframe is saved as a gff3 file.""" header=None)
"""Filtering the pd dataframe by a feature_type.
For this method a feature-type has to be given, as e.g. 'CDS'."""
"""Filtering the pd dataframe by the gene_length.
For this method the desired minimal and maximal bp length have to be given.""" (gene_length <= max_length)] input_header=self._header)
"""Saving each attribute-tag to a single column.
Attribute column will be split to 14 single columns. For this method only a data frame and not an object will be returned. Therefore, this data frame can not be saved as gff3 file.""" lambda attributes: dict([key_value_pair.split('=') for key_value_pair in attributes.split(';')])) lambda at_dic: list(at_dic.keys())) from_iterable(attribute_df ['at_dic_keys'])) at_dic.get(atr))
"""Filtering the pd dataframe by a attribute.
The 9th column of a gff3-file contains the list of feature attributes in a tag=value format. For this method the desired attribute tag as well as the corresponding value have to be given. If the value is not available an empty dataframe would be returned.""" input_header=self._header)
"""Gives the following statistics for the data:
The maximal bp-length, minimal bp-length, the count of sense (+) and antisense (-) strands as well as the count of each available feature.""" 'Maximal_bp_length': gene_length.max(), 'Minimal_bp_length': gene_length.min(), 'Counted_strands': strand_counts, 'Counted_feature_types': type_counts }
type=None, strand=None, complement=False): """To see which entries overlap with a comparable feature.
For this method the chromosom accession number has to be given. The start and end bp position for the to comparable feature have to be given, as well as optional the feature-type of it and if it is on the sense (+) or antisense (-) strand. \n Possible overlaps: \n _______nnnnnnnnnnnnnnnnnnnnnn__________ --> to comparable sequence (y) \n xxxxxxxxxxxxxxxxxxxx______________________ --> overlaping at the begin of y \n ____________________xxxxxxxxxxxxxxxxxxxxxx --> overlaping at the end of y \n _______xxxxxxxxxxxxxx_____________________ --> starts are identical and end is within y \n ___________________xxxxxxxxxxxxxx_________ --> ends are identiacal and start is within y \n _______xxxxxxxxxxxxxxxxxxxxxxxxx__________ --> start and end positions are identical to start and end of y \n By selecting 'complement=True', all the feature, which do not overlap with the to comparable feature will be returned.""" (overlap_df.start < end)) | ((overlap_df.end > start) & (overlap_df.end < end)) | ((overlap_df.start < start) & (overlap_df.end > start)) | ((overlap_df.start == start) & (overlap_df.end == end)) | ((overlap_df.start == start) & (overlap_df.end > end)) | ((overlap_df.start < start) & (overlap_df.end == end))) else:
"""Find entries which are redundant.
For this method the chromosom accession number (seq_id) as well as the feature-type have to be given. Then all entries which are redundant according to start- and end-position as well as strand-type will be found.""" 'strand']].duplicated()] |