BpForms is a toolkit for unambiguously representing the primary sequence of forms of biopolymers. By concretely representing the primary sequence of biopolymers, BpForms aims to facilitate concrete discussion about DNA modification, post-transcriptional processing, and post-translational processing; facilitate the determination of the structures of biopolymer forms; facilitate the integration of data about DNA modification, post-transcriptional processing, and post-translational processing; and enable whole-cell models that represent DNA modification, post-transcriptional processing, and post-translational processing and the functions of modified DNA, RNA, and proteins.
BpForms includes a notation for describing biopolymer forms, as well as this website, a JSON REST API, a command line interface, and a Python API for calculating properties of biopolymer forms. These tools are available open-source under the MIT license.
The BpForms notation represents biopolymers as FASTA sequences augmented with (a) multiple-letter alphabet-defined monomers delimited by curly brackets and (b) user-defined monomers described in square brackets by one or more attributes separated by "|". The structure, monomer-bond-atom, monomer-displaced-atom, left-bond-atom, left-displaced-atom, right-bond-atom, and right-displaced-atom attributes are required to calculate the chemical formula, molecular weight, and charge. All other attributes are optional.
BpForms has several pre-built alphabets.