g3mclass
development?Numerous studies show that random variation may occur in the results of biomedical tests sampled from the populations diagnosed with the same disease. The best-known example of a random variable in oncology is human epidermal growth factor receptor 2 (HER2) reported to have higher than normal levels of expression in the breast (15-30%), gastric and gastroesophageal (10-30%), ovarian (20-25%), endometrial (14-80%), bladder (23-80%), and lung (up to 20%) cancers. Statistical models incorporating the probability distribution for a random biological variable could be useful for understanding the nature of disease progression, development of targeted therapies, and improvement of patient outcomes.
g3mclass
?
The
real-world readouts of laboratory tests rarely fit into one normal
(Gaussian) distribution and for many analytes Gaussian distribution
is not achieved even after data transformation. Furthermore, when
comparing the results from the test (with disease) and refence
(without disease) samples, it is rare for two distributions of
measured values to be completely separated. This creates a
methodological dilemma for a choice of a diagnostic cutoff value that
impacts clinical decision. The paper published in Cancer
Research, 2019; 79 :3492-502 offered the potential solution for unmixing
quantitative assay data through using Bayesian approach and Gaussian
Mixture Model (GMM). The performance of the proposed probabilistic
classifier has been validated over datasets of more than 300 clinical
samples and has been shown to improve the rule-based binary
classification of tumor markers. This inspired the development of
g3mclass
,
a software that automates this method with add-ons capabilities in a
graphical user interface (GUI).
g3mclass
is unique?Unlike other statistical
programs aimed at statistical
hypothesis testing,
g3mclass
helps to perform
probabilistic
statistical classification of
each of the dataset variables into as many as probable categories. As
an advanced analytical tool,
it is more informed than the rule-based classification and
thus may improve statistical analysis of the data from
quantitative molecular assays. Focused
on Bayesian statistics, g3mclass
>offers three
classification approaches.
g3mclass
?g3mclass
aims to ease adoption of the probabilistic
classifier in research, biomedical
pharma, companion diagnostics, and ultimately in healthcare.
Currently, it is intended for basic and translational biomedical
research to help scientists accelerate candidate biomarkers and
therapeutic targets evaluation workflow.
g3mclass
do?
g3mclass
is a classification and visualization software purpose-built for
modeling the molecular assay data sampled from healthy
(reference) and diseased (test) populations. Additional
query samples (e.g. suspected disease) obtained by the same assays
may be classified.
g3mclass
work?
After
uploading a file with the reference, test and optional queries
datasets, User may set up the model parameters (defaults or
User-choice), and immediately learn the test GMM depicted in a plot.
The model learning is initialized by the expectation-maximization
(EM) algorithm with classes that correspond to peaks in the histogram
calculated on the test sample. The class labeled '0' is one that
has the same mean value and standard deviation as the coupled
reference sample. It is imposed into the test GMM; however, its
weight in the model is not fixed. If the mean values of other modeled
classes are lower or higher than that of class '0', than they are
labeled with either negative (e.g. -1, -2, etc) or positive (e.g. 1,
2, etc) integers, respectively. The bigger the integer the further
the class is positioned relative to class '0'. The g3mclass
GUI allows to setup parameters for model learning - choose either the
fixed or variable number of bins in a histogram, dismiss classes
having too low number of samples and fuse too close classes. The
preferred model is selected automatically based on the lowest
Bayesian information criterion (BIC). Upon model selection, the
classification results are presented in spreadsheets and heatmaps for
test, reference, and query (if present). The classification of
refence and all queries is based upon the corresponding test model.
g3mclass
performs classification in 3 consecutive steps:
Step1: Probability-based classification (proba) which is Bayesian classification of GMM.
Step 2:
Cutoff-based classification (cutoff) where g3mclass
calculates the cutoff values between classes as a minimal
misclassification with equal weights relative to adjacent classes
Step 3: More stringent cutoff-based classification (s. cutoff) which increases specificity