{% extends "layoutabout.html" %} {% block content %}

XspecT and ClAssT


The 'Acinetobacter species Assignment Tool' (XspecT) and the integrated 'Clone-type Assinment Tool'(ClAssT) are both very fast and easy to use tools to taxonomically assign Acinetobacter Input-Data using Bloom Filters. XspecT performs a taxonomic assignment on the Acinetobacter Species-Level while ClAssT uses Strain-Typing to identify International-Clones of A.baumannii.


About Acinetobacter

The genus Acinetobacter is a ubiquitous and highly divers group of aerobic and gram-negative bacteria consisting of about 94 different species. They occour in many ecological niches such as soil, water and in animals. They are able to survive with limited amounts of nutrients on dry surfaces and transmiss in natural and medical environment.

The taxonomic assignment of Acinetobacter species is difficult for several reasons. One of them are high similarities between closely related species. Espacially those of the A. calcoaceticus-A. baumannii-complex (ACB-complex). 5 out of 6 species of the ACB-complex are pathogens that can cause nosocomial infections. Therefore it is important to taxonomically assign Acinetobacter genomes to the correct species.

{{ 'A.baumannii' }}

Acinetobacter baumannii is a bacterium that can cause nosocomial infections and build up multidrug resistance. Therefore, this organism can be a threat to patients with a weak immune system that are currently in a hospital environment.

There are 8 'International Clonetypes' of A.baumannii with different behaviour regarding virulence or epidemicity. A identification of the present Clonetype is needed to deal with the outbreak. Current methods, like the 'Multi Locus Sequence Typing (MLST)'-method, are very time consuming.

To identify a A.baumannii International Clone, we developed the 'Clone-type Assignment Tool (ClAssT)' that can use Sequence-Reads or a assembled genome as Input-data for a fast and accurate classification. The assembling process of sequence reads is also very time consuming, but with this tool it can be skipped. ClAssT only needs a few Sequence-Reads for a classification. To avoid a long upload of a huge file that contains those reads, this tool will read your input-file client-side and then sends the needed data to the server for further processing.

{{ 'About XspecT' }}

XspecT is similar to ClAssT. For a taxonomic assignment to be performed on the species level the Input-Data is processed with the same File-Reader and the Bloom Filter are searched through the same modules. For each species up to 4 different assembled genomes (if availabe) were used as reference-data for the Bloom Filters. Only a small amount of k-mers are needed for the species assignment. XspecT uses only every 500th k-mere of a assembled genome and only every 10th k-mer of a Sequence-Read.
The Support Vector Machine (SVM) was further improved using the radial basis function as kernel-function with a regularization parameter of C = 1.5.

This tool was tested and evaluated with over 3600 Acinetobacter genomes and reached an high accuracy of 99.52%. Regardless, we do not take over any liability for any false results and classifications.

The used code can be found here on Github.

How it Works

Workflow


Reference-data

Used reference-data

The tool uses a set of up to 4 assemblys for each species.
The used Training-Data for the SVM can be found here.

Species NCBI RefSeq Assembly Accession (GCF)
A. albensis GCF_900095025.1, GCF_015209685.1
A. apis GCF_900197575.1
A. baretiae GCF_015627105.1, GCF_015627115.1
A. baumannii GCF_000453565.1, GCF_000453645.1, GCF_000453665.1, GCF_000453685.1
A. baylyi GCF_000302115.1, GCF_000368685.1, GCF_000621045.1, GCF_001485005.1
A. beijerinckii GCF_000368985.1, GCF_000369005.1, GCF_000931715.1
A. bereziniae GCF_000248295.1, GCF_000368505.1, GCF_001055215.1, GCF_001500155.1
A. bohemicus GCF_000367925.1, GCF_900116265.1
A. boissieri GCF_900096955.1
A. bouvetii GCF_000368865.1, GCF_000373725.1, GCF_001485025.1
A. brisouii GCF_000368645.1, GCF_000488275.1, GCF_000931655.1, GCF_000964015.1
A. calcoaceticus GCF_000162035.1, GCF_000818215.1, GCF_000931735.1, GCF_001510805.1
A. celticus GCF_001707755.1
A. chengduensis GCF_003664645.1
A. chinensis GCF_002165375.2
A. colistiniresistens GCF_000413935.1
A. courvalinii GCF_000369605.1, GCF_000369785.1, GCF_000580655.1
A. cumulans GCF_003024525.3, GCF_003611525.1, GCF_003611535.1, GCF_003611575.1
A. defluvii GCF_001704615.1, GCF_013072655.1
A. dispersus GCF_009884975.1
A. equi GCF_001307195.1
A. GS06 GCF_000367985.1, GCF_000369505.1
A. GS16 GCF_000764915.1, GCF_000369525.1, GCF_000368445.1
A. guerrae GCF_003611455.1, GCF_009014115.1, GCF_009372255.1
A. gandensis GCF_001678755.1, GCF_008802205.1
A. gerneri GCF_000368565.1, GCF_000430245.1, GCF_000747725.1
A. guillouiae GCF_000368485.1, GCF_000414055.1, GCF_000829655.1, GCF_002370525.1
A. gyllenbergii GCF_000414075.1, GCF_000488195.1, GCF_000931695.1, GCF_001682515.1
A. haemolyticus GCF_000369085.1, GCF_000430205.1, GCF_000830135.1, GCF_000302315.1
A. halotolerans GCF_004208515.1
A. harbinensis GCF_000816495.1
A. idrijaensis GCA_905480265.1
A. indicus GCF_000488255.1, GCF_000830155.1, GCF_000964095.1, GCF_001922625.1
A. johnsonii GCF_000302335.1, GCF_000368045.1, GCF_000368805.1, GCF_000949655.1
A. junii GCF_000368765.1, GCF_000430225.1, GCF_000775815.1, GCF_000813495.1
A. kanungonis GCF_009939195.1
A. kookii GCF_900096895.1
A. kyonggiensis GCF_900107285.1
A. lactucae GCF_001605885.1, GCF_002076935.1, GCF_001595745.1, GCF_013122135.1
A. lanii GCF_011191955.1, GCF_011578285.1
A. larvae GCF_001704115.1
A. lwoffii GCF_000162095.1, GCF_000248355.1, GCF_000301755.1
A. marinus GCF_900096915.1
A. modestus GCF_014636095.1
A. nectaris GCF_000488215.1
A. nosocomialis GCF_000472005.1, GCF_000529215.1, GCF_000775795.1, GCF_000775895.1
A. oleivorans GCF_000196795.1, GCF_000488235.1, GCF_000836035.1
A. parvus GCF_000248155.1, GCF_000368005.1, GCF_000368025.1, GCF_000962795.1
A. piscicola GCF_002233755.1, GCF_004152775.1, GCF_015218165.1
A. pittii GCF_000949775.1, GCF_000949795.1, GCF_001005885.1, GCF_001056355.1
A. pollinis GCF_015627175.1, GCF_015627205.1, GCF_015627215.1, GCF_015627235.1
A. populi GCF_002174125.1
A. portensis GCF_009372215.1, GCF_010646905.1, GCF_001605895.1
A. proteolyticus GCF_001753605.1, GCF_002835245.1, GCF_000367945.1
A. pseudolwoffii GCF_0003694451.1, GCF_000369105.1
A. pullicarnis GCF_006352475.1
A. puyangensis GCF_900096995.1
A. qingfengensis GCF_001753595.1, GCF_008693185.1
A. radioresistens GCF_000368885.1, GCF_000368905.1, GCF_000972345.1, GCF_001917365.1
A. rathckeae GCF_015627125.1, GCF_015627165.1
A. rongchengensis GCF_003611475.1
A. rudis GCF_000413895.1, GCF_000829675.1
A. schindleri GCF_000368465.1, GCF_000368625.1, GCF_001485065.1, GCF_001971565.1
A. seifertii GCF_000368065.1, GCF_001054375.1, GCF_001707675.1, GCF_002148925.1
A. seohaensis GCA_018403785.1
A. shaoyimingii GCF_011174715.1, GCF_011578045.1
A. sichuanensis GCF_003024515.2
A. soli GCF_000368725.1, GCF_000633005.1, GCF_000760595.1, GCF_001414585.1
A. stercoris GCF_900323515.1
A. tandoii GCF_000400735.1, GCF_000621065.1, GCF_000760555.1
A. terrae GCF_001647595.1, GCF_002135195.1, GCF_004331165.1, GCF_013004255.1
A. terrestris GCF_004331115.1, GCF_004331155.1, GCF_004331185.1, GCF_013004415.1
A. tianfuensis GCF_003611465.1
A. tjernbergiae GCF_000374425.1, GCF_000488175.1, GCF_000759995.1
A. towneri GCF_000368785.1, GCF_000688495.1, GCF_000760575.1, GCF_001758345.1
A. ursingii GCF_000368845.1, GCF_000369885.1, GCF_000934145.1, GCF_000949815.1
A. variabilis GCF_000369625.1, GCF_000804305.1, GCF_000368385.1, GCF_018409485.1
A. venetianus GCF_000308235.1, GCF_000368585.1, GCF_001484985.1, GCF_001577485.1
A. vivianii GCF_014635885.1, GCF_016502725.1
A. wanghuae GCF_009557235.1, GCF_009601085.1
A. wuhouensis GCF_001696605.3, GCF_002165345.2, GCF_004209115.1, GCF_004209325.1

{{ 'About ClAssT' }}

The code for the assignment process is inspired by BIGSI.

Find the actual used code for this thesis project here.

This tool has been evaluated by validating more than 2700 MLST-classifications of Acinetobacter genomes. Regardless, we do not take over any liability for any false results and classifications.

How it Works

Workflow


Reference-data

Used reference-data

The tool uses a set of Assemblys per Clonetype.

Clonetype NCBI RefSeq Assembly Accession (GCF)
IC1 'GCF_000369185.1', 'GCF_000453105.1', 'GCF_002416345.1', 'GCF_000177695.1', 'GCF_001444225.1', 'GCF_001657725.1', 'GCF_900031715.1', 'GCF_002119355.1', 'GCF_000309275.1', 'GCF_000453045.1', 'GCF_000586635.1', 'GCF_001444255.1', 'GCF_001399655.1', 'GCF_000969385.1'
IC2 'GCF_000809645.3', 'GCF_002762155.1', 'GCF_000580275.1', 'GCF_000810425.3', 'GCF_000589595.1', 'GCF_000309175.1', 'GCF_002183905.1', 'GCF_000811545.3', 'GCF_001669115.1', 'GCF_000939555.2', 'GCF_002277385.1', 'GCF_002183625.1', 'GCF_900118135.1', 'GCF_001433675.1'
IC3 'GCF_000309215.1', 'GCF_001950095.1', 'GCF_002136595.1', 'GCF_001950315.1', 'GCF_001432825.1', 'GCF_000286535.1', 'GCF_000305295.1', 'GCF_001666225.1', 'GCF_002016825.1', 'GCF_000215005.1', 'GCF_000581435.1', 'GCF_000278645.1', 'GCF_000278625.1', 'GCF_002150405.1'
IC4 'GCF_000368245.1', 'GCF_000988155.1', 'GCF_001500225.1', 'GCF_001758085.1', 'BMBF-193'
IC5 'GCF_001433245.1', 'GCF_001433015.1', 'GCF_001414735.1', 'GCF_002143145.1', 'GCF_000302255.1', 'GCF_001415385.1', 'GCF_000515855.1', 'GCF_002137495.1', 'GCF_002018995.1', 'GCF_002183505.1', 'GCF_001649805.1', 'GCF_001432975.1', 'GCF_001649795.1', 'GCF_001433655.1'
IC6 'GCF_000453985.1', 'GCF_900161935.1', 'GCF_001432845.1', 'GCF_900161985.1', 'GCF_900161915.1', 'GCF_000189695.1', 'GCF_000453965.1', 'GCF_900161995.1', 'GCF_000681555.1', 'GCF_000516155.1', 'GCF_900161975.1', 'GCF_900161955.1', 'GCF_900162015.1', 'GCF_900161885.1'
IC7 'GCF_002573805.1', 'GCF_001007725.1', 'GCF_001612105.1', 'GCF_001007705.1', 'GCF_001611995.1', 'GCF_000583375.1', 'GCF_002573915.1', 'GCF_000278665.1', 'GCF_000241705.1', 'GCF_001612095.1', 'GCF_002143975.1', 'GCF_002927855.1', 'GCF_000516575.2', 'GCF_002504145.1'
IC8 'GCF_002137755.1', 'GCF_001864715.1', 'GCF_002573795.1', 'GCF_001441455.1', 'GCF_002017485.1', 'GCF_000588675.1', 'GCF_001441415.1', 'GCF_001441565.1', 'GCF_001861935.1', 'GCF_002183925.1', 'GCF_001578145.1', 'GCF_001441405.1', 'GCF_001908295.1', 'GCF_002144965.1'
Classification with Support Vector Machine

The tool generates a vector, each vector element holds the score of a selected filter. To assign the result vector to one of the selected organisms (or none of them), a Machine Learning method called 'Support Vector Machine (SVM)' has been used. SVM needs training-data for classification, the following vectors have been used as training-data for your assignment:

{{ svm_table | safe }}
bla-OXA-genes

The following bla-OXA-Families are currently available:

{% for id in oxa_ids%}

{{ id }}

{% endfor %}

How-to-use


Workflow

  1. Select your sequence reads or assembled genome input-data. Allowed formats are fna/fasta and fastq
    Adjust options: Oxa-Genes = Screen for Oxa-families, Quick-Mode (only for fna/fasta) = fast search, takes less k-mers (default mode) Sequence-Reads = Amount of reads that will be used (5000 at default).
  2. The result of the taxonomic assignment will be shown here.
  3. A detailed results plot for the species assignment/strain-typing will be shown here.
    The other tabs include a plot for all oxa.genes (if the option was selected), and a literature search with relevant papers from PubMed to the assigned species.
    All plots can be downloaded with the download-button at the bottom of each plot.

{{ 'About this website' }}


XspecT and ClAssT are provided by the Department for Applied Bioinformatics in Frankfurt (Germany).

XspecT is a bachelor thesis project of Dominik Sens.
ClAssT is a bachelor thesis project of Sam U. Gimbel.

Charts are made with Chart.js.

{% endblock content %}