Alignment methods

biscot.Alignment.get_leftmost_label(label_list, channel, reference_map)[source]

Extracts all label positions from a label list ids and reference maps and returns the one that has the minimum position on the anchor

Parameters
  • label_list (list(int)) – List of label ids

  • channel (int (1 or 2)) – Enzyme channel to consider to extract label position

  • reference_map (dict(integer: Map)) – Dict containing anchor Map objects

Returns

Returns the label id that satisfies label_position = min(all_label_positions)

Return type

int

biscot.Alignment.get_rightmost_label(label_list, channel, reference_map)[source]

Extracts all label positions from a label list ids and reference maps and returns the one that has the maximum position on the anchor

Parameters
  • label_list (list(int)) – List of label ids

  • channel (int (1 or 2)) – Enzyme channel to consider to extract label position

  • reference_map (dict(int, Map)) – Dict containing anchor Map objects

Returns

Returns the label id that satisfies label_position = max(all_label_positions)

Return type

int

biscot.Alignment.get_shared_labels(aln_1, aln_2)[source]

Parses two Alignments objects and returns anchor map label ids for which both contig maps are aligned to

Parameters
Returns

A tuple of lists that contains reference label ids which corresponds to the overlap between aln_1 and aln_2. One list for each channel.

Return type

tuple(list(int), list(int))

biscot.Alignment.line_to_alignment(line, channel)[source]

Converts an xmap line to an Alignment object

Parameters
  • line (str) – A line of an xmap file

  • channel (int) – Enzyme channel to consider

Returns

An alignment object

Return type

Alignment

biscot.Alignment.parse_xmap(reference_maps_dict, xmap_1_path, xmap_2_path, deleted_xmap_records, xmap_two_enzymes_path='', only_confirmed_positions=False)[source]

Parses from one to three xmaps and converts lines to Alignment objects

Parameters
  • reference_maps_dict (dict(int, Map)) – Dict containing anchor Map obecjts

  • xmap_1_path (str) – Path to the first xmap_file

  • xmap_2_path (str) – Path to the second xmap file

  • deleted_xmap_records (dict(int, Alignment)) – Dict containing Alignment objects that were deleted due to a larger alignment being found

  • xmap_two_enzymes_path (str, optional) – Path to the 2-enzyme xmap file, defaults to “”

  • only_confirmed_positions (bool, optional) – If True, only alignments contained in xmap_1 or xmap_2 AND in xmap_2enzymes will be conserved, defaults to False

biscot.Alignment.print_agp(reference_maps_dict, key_dict, deleted_xmap_records, contigs_map_dict)[source]

Searches for shared labels between two Alignment objects and calls the correct function

Parameters
  • reference_maps_dict (dict(int, Map)) – Dict containing anchor maps

  • key_dict (dict((int, int, int): (str, int, int, int))) – Dict containing the correspondance between contigs and contig maps

  • deleted_xmap_records (dict(integer: Alignment)) – Dict containing smaller alignments that weren’t retained when parsing xmaps

  • contigs_map_dict (dict(int, Map)) – Dict containing contig maps

Returns

List containing contig maps that were used to build current scaffold

Return type

list(int)

biscot.Alignment.print_agp_line_no_intersection(aln_1, aln_2, previous_ref_end, previously_scaffolded_maps, key_dict, previous_part_number)[source]

Prints an AGP line, formatted following the AGP2 standard. Used when two contig maps don’t share anchor labels.

Parameters
  • aln_1 (Alignment) – Alignment that has the smallest reference_start

  • aln_2 (Alignment) – Alignment that has the highest reference_start

  • previous_ref_end (int) – Current position in the scaffold being built

  • previously_scaffolded_maps (list(int)) – Contig map ids that were previously used to build the current scaffold

  • key_dict (dict((int, int, int), (str, int, int, int))) – Dict containing the correspondance between contigs and contig maps

  • previous_part_number (int) – Current AGP line id

Returns

Current position in the scaffold being build after applying the changes

Return type

int

biscot.Alignment.print_agp_line_with_intersection(aln_1, aln_2, previous_ref_end, previously_scaffolded_maps, contigs_map_dict, previous_part_number, key_dict)[source]

Prints a line formatted by following the AGP2 standard. Used when two contig maps share labels.

Parameters
  • aln_1 (Alignment) – Alignment that has the smallest reference_start

  • aln_2 (Alignment) – Alignment that has the highest reference_start

  • previous_ref_end (int) – Current position in the scaffold that is being built

  • previously_scaffolded_maps (list(int)) – Contig map ids that were previously used to build the current scaffold

  • contigs_map_dict (dict(int, Map)) – Dict containing contig maps

  • previous_part_number (int) – Id of the previous line

  • key_dict (dict((int, int, int), (str, int, int, int))) – Dict containing the correspondance between contigs and contig maps

Returns

Current position in the scaffold after applying the changes

Return type

int

biscot.Alignment.print_gap_line(aln_1, aln_2, reference_id, previous_reference_end, previous_part_number, key_dict)[source]

Prints an AGP line, formatted following the AGP2 standard. Used to print an ‘N’ line.

Parameters
  • aln_1 (Alignment) – Alignment that has the smallest reference_start

  • aln_2 (Alignment) – Alignment that has the highest reference_start

  • reference_id (int) – Id of the anchor map

  • previous_reference_end (int) – Current position in the scaffold being built

  • previous_part_number (int) – Current AGP line id

  • key_dict (dict((int, int, int), (str, int, int, int))) – Dict containing the correspondance between contigs and contig maps

Returns

Current position in the scaffold being built after applying changes

Return type

int

biscot.Alignment.solve_alignment_containment(reference_maps_dict, contigs_map_dict, key_dict)[source]

Calls the contained alignment solver function for each alignment couple

Parameters
  • contained_alignments – Tuple containing the contained alignment (second position) and the large alignment (first position)

  • contigs_map_dict (dict(int, Map)) – Dict containing contig maps

  • key_dict (dict((int, int, int), (str, int, int, int))) – Dict containing correspondance between contigs and contig maps

biscot.Alignment.solve_containment(aln_couple, reference_maps_dict, contig_maps_dict, key_dict)[source]
Tries to integrate a small map into a larger one.
Let’s consider a Map 1 that is aligned on the reference from position 1 to 100 and a Map 2 that is aligned on the reference from position 25 to 75.
The goal of this function is to break alignment of Map 1 into two alignments (1-25 and 75-100).
Parameters
  • aln_couple (tuple(Alignment, Alignment)) – Two Alignment objects. The first one being the ‘small alignment’ and the second, the ‘large alignment’

  • reference_maps_dict (dict(int, Map)) – Dict of anchor Map objects

  • contig_maps_dict (dict(int, Map)) – Dict of contig Map objects

  • key_dict (dict((int, int, int), (str, int, int, int))) – Dict containing the correspondance between Map objects and actual sequences

biscot.Alignment.write_unplaced_contigs(key_dict, contigs_sequence_dict, scaffolded_maps)[source]

Incorporates contigs that weren’t scaffolded into the AGP file

Parameters
  • key_dict (dict((int, int, int), (str, int, int, int))) – Dict containing the correspondance between contigs and contig maps

  • contigs_sequence_dict (dict(str, str)) – Dict containing fasta sequences

  • scaffolded_maps (list(int)) – List containing contig map ids that were scaffolded