jax-smfsb: A python library for stochastic systems biology modelling and inference

Author
Affiliations
Darren J. Wilkinson

1

Published

September 19, 2024

Summary

Many biological processes, and especially molecular biochemical processes, exhibit non-trivial stochasticity in their dynamical behaviour (Wilkinson 2009). The popular textbook Stochastic modelling for systems biology, third edition (Wilkinson 2018) describes the stochastic approach to modelling and simulation of biochemical processes, and how to do Bayesian inference for the parameters of such models using time course data (Golightly and Wilkinson 2011). jax-smfsb provides a fast and efficient implementation of all of the algorithms described in Wilkinson (2018), able to effectively exploit multiple cores and GPUs, leading to performance suitable for the analysis of non-trivial research problems.

Statement of Need

Although there exist many tools for modelling biological network dynamics using deterministic approaches, typically based on ordinary differential equations (ODEs), there are relatively few flexible software libraries for modelling and simulation of stochastic biochemical networks, although libRoadRunner (Welsh et al. 2022) and SBSCL (Panchiwala et al. 2021) are notable examples. There are even fewer libraries for principled (fully Bayesian) inference for the parameters of such networks using data.

In addition to describing the mathematical framework for stochastic modelling, simulation, and inference, Wilkinson (2018) also describes a software implementation of all of the algorithms. The language chosen to illustrate the implementation was R (R Core Team 2024), and the library is available as the package smfsb (Wilkinson 2024). While this library is of significant pedagogical value, the overheads of dynamic interpreted languages such as R make it unsuitable for the development of high-performance codes needed for non-trivial research problems. An implementation in the compiled strongly-typed functional language Scala (Odersky et al. 2004), scala-smfsb (Wilkinson 2019) partially addresses this issue, but the lack of systems biology students and researchers familiar with Scala has limited the impact of this library. More recently, a Python (Van Rossum and Drake 2009) port of the library, python-smfsb (Wilkinson 2023) has been developed, utilising the Python libraries numpy (Harris et al. 2020) and scipy (Virtanen et al. 2020). This is of significant pedagogical value, since Python has become a more popular programming language for systems biology modelling than R. Nevertheless, the performance of this library is similar to that of the R library, inadequate for serious research problems.

jax-smfsb addresses all of the limitations of the previously described implementations. It is essentially a port of python-smfsb with numpy and scipy replaced by JAX (Bradbury et al. 2018). JAX is a state-of-the-art high-performance machine learning framework that turns out to be well-suited to a range of problems in numerical, scientific and statistical computing. JAX is effectively a functional language for differentiable array processing embedded in Python, allowing just-in-time compilation and execution on modern hardware with state-of-the-art performance. In addition to a large number of machine learning libraries based on JAX, a growing ecosystem of libraries for scientific computing is developing; see, for example, diffrax (Kidger 2021), JAX-MD (Schoenholz and Cubuk 2021), and JAX-Fluids (Bezgin, Buhendwa, and Adams 2023). jax-smfsb adds to this ecosystem by providing tools for modelling, simulation and Bayesian inference for stochastic (biochemical) network models.

Features

Similarly to the NumPy version of the library, in addition to providing a small library of pre-defined models, and allowing direct specification of models as stochastic Petri nets using arrays, jax-smfsb can also parse models encoded in the Systems Biology Markup Language (SBML) (Keating et al. 2020) using libSBML (Bornstein et al. 2008), in addition to those encoded using the SBML-shorthand notation used in Wilkinson (2018). Exact and approximate stochastic simulation algorithms are provided for both the well-mixed and spatial (reaction-diffusion) case. Exact and approximate Bayesian inference algorithms for parameter inference are provided based on ABC (Marin et al. 2011), ABC-SMC (Toni et al. 2008) or particle MCMC (Andrieu, Doucet, and Holenstein 2010) approaches.

Many demos are provided in the demos directory. These are direct equivalents to the demos provided in the python-smfsb library, so it is easy to compare the performance of the Python+JAX implementations to the more conventional Python+Numpy implementation. Precise timings are very hardware and problem dependent, but on a high-specification machine, speed-ups of around two orders of magnitude for computationally intensive simulation or inference tasks can be expected. On a Linux server with an Intel i7-12700 processor, the CPU-only version of jax-smfsb gives speedup factors relative to python-smfsb ranging from around 50 to around 2000 on the timing examples included in the demos directory.

References

Andrieu, Christophe, Arnaud Doucet, and Roman Holenstein. 2010. “Particle Markov Chain Monte Carlo Methods.” Journal of the Royal Statistical Society Series B: Statistical Methodology 72 (3): 269–342. https://doi.org/10.1111/j.1467-9868.2009.00736.x.
Bezgin, Deniz A., Aaron B. Buhendwa, and Nikolaus A. Adams. 2023. “JAX-Fluids: A Fully-Differentiable High-Order Computational Fluid Dynamics Solver for Compressible Two-Phase Flows.” Computer Physics Communications 282 (January): 108527. https://doi.org/10.1016/j.cpc.2022.108527.
Bornstein, Benjamin J., Sarah M. Keating, Akiya Jouraku, and Michael Hucka. 2008. “LibSBML: An API Library for SBML.” Bioinformatics 24 (6): 880–81. https://doi.org/10.1093/bioinformatics/btn051.
Bradbury, James, Roy Frostig, Peter Hawkins, Matthew James Johnson, Chris Leary, Dougal Maclaurin, George Necula, et al. 2018. JAX: Composable Transformations of Python+NumPy Programs (version 0.3.13). http://github.com/jax-ml/jax.
Golightly, Andrew, and Darren J. Wilkinson. 2011. “Bayesian Parameter Inference for Stochastic Biochemical Network Models Using Particle Markov Chain Monte Carlo.” Interface Focus 1 (6): 807–20. https://doi.org/10.1098/rsfs.2011.0047.
Harris, Charles R., K. Jarrod Millman, Stéfan J. van der Walt, Ralf Gommers, Pauli Virtanen, David Cournapeau, Eric Wieser, et al. 2020. “Array Programming with NumPy.” Nature 585 (7825): 357–62. https://doi.org/10.1038/s41586-020-2649-2.
Keating, Sarah M, Dagmar Waltemath, Matthias König, Fengkai Zhang, Andreas Dräger, Claudine Chaouiya, Frank T Bergmann, et al. 2020. “SBML Level 3: An Extensible Format for the Exchange and Reuse of Biological Models.” Molecular Systems Biology 16 (8). https://doi.org/10.15252/msb.20199110.
Kidger, Patrick. 2021. On Neural Differential Equations.” PhD thesis, University of Oxford.
Marin, Jean-Michel, Pierre Pudlo, Christian P. Robert, and Robin J. Ryder. 2011. “Approximate Bayesian Computational Methods.” Statistics and Computing 22 (6): 1167–80. https://doi.org/10.1007/s11222-011-9288-2.
Odersky, Martin, Philippe Altherr, Vincent Cremet, Burak Emir, Sebastian Maneth, Stéphane Micheloud, Nikolay Mihaylov, Michel Schinz, Erik Stenman, and Matthias Zenger. 2004. “An Overview of the Scala Programming Language.”
Panchiwala, Hemil, Shalin Shah, Hannes Planatscher, Mykola Zakharchuk, Matthias König, and Andreas Dräger. 2021. “The Systems Biology Simulation Core Library.” Edited by Jinbo Xu. Bioinformatics 38 (3): 864–65. https://doi.org/10.1093/bioinformatics/btab669.
R Core Team. 2024. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.
Schoenholz, Samuel S, and Ekin D Cubuk. 2021. “JAX-MD: A Framework for Differentiable Physics.” Journal of Statistical Mechanics: Theory and Experiment 2021 (12): 124016. https://doi.org/10.1088/1742-5468/ac3ae9.
Toni, Tina, David Welch, Natalja Strelkowa, Andreas Ipsen, and Michael P. H Stumpf. 2008. “Approximate Bayesian Computation Scheme for Parameter Inference and Model Selection in Dynamical Systems.” Journal of The Royal Society Interface 6 (31): 187–202. https://doi.org/10.1098/rsif.2008.0172.
Van Rossum, Guido, and Fred L. Drake. 2009. Python 3 Reference Manual. Scotts Valley, CA: CreateSpace.
Virtanen, Pauli, Ralf Gommers, Travis E. Oliphant, Matt Haberland, Tyler Reddy, David Cournapeau, Evgeni Burovski, et al. 2020. SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python.” Nature Methods 17: 261–72. https://doi.org/10.1038/s41592-019-0686-2.
Welsh, Ciaran, Jin Xu, Lucian Smith, Matthias König, Kiri Choi, and Herbert M Sauro. 2022. “libRoadRunner 2.0: A High Performance SBML Simulation and Analysis Library.” Edited by Pier Luigi Martelli. Bioinformatics 39 (1). https://doi.org/10.1093/bioinformatics/btac770.
Wilkinson, Darren J. 2009. “Stochastic Modelling for Quantitative Description of Heterogeneous Biological Systems.” Nature Reviews Genetics 10 (2): 122–33. https://doi.org/10.1038/nrg2509.
———. 2018. Stochastic Modelling for Systems Biology, Third Edition. Chapman& Hall/CRC Press. https://doi.org/10.1201/9781351000918.
———. 2019. “Scala-Smfsb.” GitHub Repository. GitHub. https://github.com/darrenjw/scala-smfsb.
———. 2023. “Python-Smfsb.” GitHub Repository. GitHub. https://github.com/darrenjw/python-smfsb.
———. 2024. Smfsb: Stochastic Modelling for Systems Biology. https://doi.org/10.32614/cran.package.smfsb.