Parameters fit with pyABC#
This is a simple example of how we can use a package like pyABC to estimate the fundamental parameters for an observed cluster. The user can combine ASteCA with any other package of their choosing, here’s a list with many Python-based MCMC and ABC packages.
Loading ASteCA objects#
We start by instantiating an Isochrones object with a PARSEC isochrone. This is an example file but you can use whatever isochrone service fits your needs:
import asteca
# Load isochrones
isochs = asteca.Isochrones(
model='parsec',
isochs_path="../_static/parsec/",
magnitude="Gmag",
color=("G_BPmag", "G_RPmag"),
magnitude_effl=6390.7,
color_effl=(5182.58, 7825.08),
verbose=2
)
Instantiating isochrones
Model : PARSEC
N_files : 1
N_mets : 3
N_ages : 11
N_isochs : 2000
z range : [0.01, 0.02]
loga range : [7.0, 9.5]
Magnitude : Gmag
Color : G_BPmag-G_RPmag
Isochrone object generated
Next we instantiate a Synthetic object, passing the isochs
object we just created (we are using all defaults for the arguments here):
# Create Synthetic cluster object
synthcl = asteca.Synthetic(isochs, seed=457304, verbose=2)
Instantiating synthetic
Default params : {'met': 0.0152, 'loga': 8.0, 'alpha': 0.09, 'beta': 0.94, 'Rv': 3.1, 'DR': 0.0, 'Av': 0.2, 'dm': 9.0}
Extinction law : CCMO
Diff reddening : uniform
IMF : chabrier_2014
Max init mass : 10000
Gamma dist : D&K
Random seed : 457304
Synthetic clusters object generated
Now we load our observed cluster into a Cluster object:
import pandas as pd
obs_df = pd.read_csv("../_static/cluster.csv")
my_cluster = asteca.Cluster(
ra=obs_df["RA_ICRS"],
dec=obs_df["DE_ICRS"],
magnitude=obs_df["Gmag"],
e_mag=obs_df["e_Gmag"],
color=obs_df["BP-RP"],
e_color=obs_df["e_BP-RP"],
verbose=2
)
Instantiating cluster
Columns read : RA, DEC, Magnitude, e_mag, Color, e_color
N_stars : 2759
N_clust_min : 25
N_clust_max : 5000
Cluster object generated
We assume that the cluster file loaded in obs_df
is composed of only the most probable members of the cluster. If not, you can refer to the Membership probabilities tutorial to learn how to obtain the membership probabilities and select the most probable members.
Finally, we need to calibrate our synthcl
object and instantiate a Likelihood object, which will be used to quantify how similar our observed cluster is to the generated synthetic clusters. For both operations we need to pass the my_cluster
object we generated above.
# Calibrate the `synthcl` object
synthcl.calibrate(my_cluster)
# Instantiate the likelihood
likelihood = asteca.Likelihood(my_cluster)
Calibrated observed cluster
N_stars_obs : 2759
Max magnitude : 19.00
Error distribution loaded
Likelihood object generated
Running pyABC#
We are now ready to begin the fundamental parameters estimation process with pyABC. The first step is to import pyABC and define the priors for the parameters that are being estimated, as well as a dictionary with the remaining fundamental parameters that are not being estimated and are thus fixed to a certain value.
Notice that the priors defined this way for pyABC require using the minimum value and the desired range, not the maximum value.
import pyabc
met_min, met_max = 0.01, 0.02
loga_min, loga_max = 7.0, 9.5
dm_min, dm_max = 8.0, 10.5
Av_min, Av_max = 0.0, 2.0
# Define a pyABC Distribution(). Uniform distributions are employed for all the parameters
# here but the user can of course change this as desired. See the pyABC docs for more
# information.
priors = pyabc.Distribution(
{
"met": pyabc.RV("uniform", met_min, met_max - met_min),
"loga": pyabc.RV("uniform", loga_min, loga_max - loga_min),
"dm": pyabc.RV("uniform", dm_min, dm_max - dm_min),
"Av": pyabc.RV("uniform", Av_min, Av_max - Av_min)
}
)
pyABC works by minimizing the distance between our data (the observed cluster) and synthetic data (the synthetic clusters). We will need two convenience functions to do this.
The first function required is model()
which takes a dictionary with the fundamental parameters being estimated by pyABC, generates a synthetic cluster via the generate()
method, and returns it as a dictionary. The returned variable is a dictionary simply because this is what pyABC expects; this is not a requirement of ASteCA.
def model(fit_params):
"""Generate a synthetic cluster."""
# Call generate method with the fit_params dictionary
synth_clust = synthcl.generate(fit_params)
# pyABC expects a dictionary from this function, so we return a
# dictionary with a single element.
return {"data": synth_clust}
We then define a distance()
function that returns the likelihood lkl
between the observed and synthetic data, via the likelihood
object. The likelihood value is normalized and inverted by ASteCA if the method used is plr
(Poisson Likelihood Ratio, the default method) This is required because pyABC wants to minimize a distance and a likelihood needs to be maximized. This way the likelihood behaves like a distance.
Notice that this function receives two arguments from pyABC but we only require one, hence the second argument is dismissed. The function makes use of the dictionary generated by the model()
function, containing the synthetic cluster.
def distance(synth_dict, _):
"""The likelihood returned works as a distance which means that the optimal value is 0.0.
"""
return likelihood.get(synth_dict["data"])
We create an ABCSMC object with the model
and distance
functions, as well as the priors defined earlier. A population of 100 is usually enough. The tempfile defined below is required by pyABC.
# Define pyABC parameters
pop_size = 100
abc = pyabc.ABCSMC(
model,
priors,
distance,
population_size=pop_size
)
# Define a temporary file required by pyABC
import os
import tempfile
db_path = "sqlite:///" + os.path.join(tempfile.gettempdir(), "pyABC.db")
abc.new(db_path)
Show code cell output
ABC.Sampler INFO: Parallelize sampling on 8 processes.
ABC.History INFO: Start <ABCSMC id=8, start_time=2025-04-28 11:05:20>
<pyabc.storage.history.History at 0x7b82a3567dd0>
Finally, we can run pyABC to perform Approximate Bayesian Inference on our parameters. We set a small number of populations here as an example. Running for ~5 minutes is usually enough (see pyABC’s documentation on how to set up a maximum running time):
history = abc.run(minimum_epsilon=0.01, max_nr_populations=20)
Show code cell output
ABC INFO: Calibration sample t = -1.
ABC INFO: t: 0, eps: 1.95391896e-01.
ABC INFO: Accepted: 100 / 191 = 5.2356e-01, ESS: 1.0000e+02.
ABC INFO: t: 1, eps: 1.19300951e-01.
ABC INFO: Accepted: 100 / 206 = 4.8544e-01, ESS: 6.4605e+01.
ABC INFO: t: 2, eps: 8.77885715e-02.
ABC INFO: Accepted: 100 / 211 = 4.7393e-01, ESS: 7.2239e+01.
ABC INFO: t: 3, eps: 7.04138311e-02.
ABC INFO: Accepted: 100 / 241 = 4.1494e-01, ESS: 8.2630e+00.
ABC INFO: t: 4, eps: 6.37649663e-02.
ABC INFO: Accepted: 100 / 206 = 4.8544e-01, ESS: 6.9823e+01.
ABC INFO: t: 5, eps: 5.16923350e-02.
ABC INFO: Accepted: 100 / 272 = 3.6765e-01, ESS: 8.5641e+01.
ABC INFO: t: 6, eps: 4.19125709e-02.
ABC INFO: Accepted: 100 / 295 = 3.3898e-01, ESS: 8.4727e+01.
ABC INFO: t: 7, eps: 3.48340129e-02.
ABC INFO: Accepted: 100 / 269 = 3.7175e-01, ESS: 6.7201e+01.
ABC INFO: t: 8, eps: 2.76012492e-02.
ABC INFO: Accepted: 100 / 292 = 3.4247e-01, ESS: 8.5830e+01.
ABC INFO: t: 9, eps: 2.34577053e-02.
ABC INFO: Accepted: 100 / 282 = 3.5461e-01, ESS: 7.8710e+01.
ABC INFO: t: 10, eps: 2.14400372e-02.
ABC INFO: Accepted: 100 / 272 = 3.6765e-01, ESS: 3.4105e+01.
ABC INFO: t: 11, eps: 2.00771783e-02.
ABC INFO: Accepted: 100 / 390 = 2.5641e-01, ESS: 7.2748e+01.
ABC INFO: t: 12, eps: 1.88689915e-02.
ABC INFO: Accepted: 100 / 491 = 2.0367e-01, ESS: 6.2440e+01.
ABC INFO: t: 13, eps: 1.81290621e-02.
ABC INFO: Accepted: 100 / 735 = 1.3605e-01, ESS: 5.6430e+01.
ABC INFO: t: 14, eps: 1.75497839e-02.
ABC INFO: Accepted: 100 / 1051 = 9.5147e-02, ESS: 5.5840e+01.
ABC INFO: t: 15, eps: 1.71326583e-02.
ABC INFO: Accepted: 100 / 1288 = 7.7640e-02, ESS: 3.1223e+01.
ABC INFO: t: 16, eps: 1.67048943e-02.
ABC INFO: Accepted: 100 / 1911 = 5.2329e-02, ESS: 5.9202e+01.
ABC INFO: t: 17, eps: 1.63733847e-02.
ABC INFO: Accepted: 100 / 1778 = 5.6243e-02, ESS: 5.3924e+01.
ABC INFO: t: 18, eps: 1.60277728e-02.
ABC INFO: Accepted: 100 / 3463 = 2.8877e-02, ESS: 1.7928e+01.
ABC INFO: t: 19, eps: 1.58940459e-02.
ABC INFO: Accepted: 100 / 8681 = 1.1519e-02, ESS: 3.0782e+01.
ABC INFO: Stop: Maximum number of generations.
ABC.History INFO: Done <ABCSMC id=8, duration=0:00:35.091309, end_time=2025-04-28 11:05:55>
Extracting the results#
We are now ready to extract a few important results, along with our fundamental parameters’ estimations.
The first line shows the final minimized distance between our observed cluster and the synthetic clusters that were generated. Usually a value below 10% (<0.1) means that a reasonable enough fit was found. Notice that this is true for an analysis like this one, where pyABC was used.
The following line extracts the DataFrame of the last run, and its associated weights. We use the weights to show the effective sample size. This value depends on the population size used above (pop_size
) and should be large enough to allow decent mean/median/STDDEV values to be extracted.
final_dist = pyabc.inference_util.eps_from_hist(history)
print("Final minimized distance: {:.2f} ({:.0f}%)".format(final_dist, 100*final_dist))
# Extract last iteration and weights
df, w = history.get_distribution()
ESS = pyabc.weighted_statistics.effective_sample_size(w)
print("Effective sample size: {:.0f}".format(ESS))
Final minimized distance: 0.02 (2%)
Effective sample size: 31
Finally, the estimations for each fitted parameter is extracted from the DataFrame using the associated weights (again, this is a pyABC dependent method; other packages will do this differently)
print("\nParameters estimation:")
print("----------------------")
fit_params = {}
for k in df.keys():
# Extract medians for the fitted parameters
_median = pyabc.weighted_statistics.weighted_median(df[k].values, w)
fit_params[k] = _median
# Extract STDDEV for the fitted parameters
_std = pyabc.weighted_statistics.weighted_std(df[k].values, w)
print("{:<5}: {:.3f} +/- {:.3f}".format(k, _median, _std))
Parameters estimation:
----------------------
Av : 0.442 +/- 0.061
dm : 8.249 +/- 0.036
loga : 8.003 +/- 0.058
met : 0.019 +/- 0.002
pyABC has many methods to visualize and analyze the results, see Visualization and analysis. We show here just a few:
pyabc.settings.set_figure_params("pyabc") # for beautified plots
# Matrix of 1d and 2d histograms over all parameters
pyabc.visualization.plot_histogram_matrix(history);

# Credible intervals over time
pyabc.visualization.plot_credible_intervals(history);

We can generate a CMD comparing our observed cluster with the “best fit” synthetic cluster found by pyABC. We start by updating the fit_params
dictionary, which contains the median of the parameter distributions found by pyABC, to include those parameters that were not fitted and are stored in the fixed_params
dictionary:
# Generate the "best fit" synthetic cluster using these parameters
synth_arr = synthcl.generate(fit_params)
and generate a side-by-side CMD plot showing also the theoretical isochrone associated to the “best fit” synthetic cluster:
Show code cell source
import matplotlib.pyplot as plt
import numpy as np
def cmd_plot(color, mag, label, ax=None):
"""Function to generate a CMD plot"""
if ax is None:
ax = plt.subplot(111)
label = label + f", N={len(mag)}"
ax.scatter(color, mag, alpha=0.25, label=label)
ax.legend()
ax.set_ylim(mag.max() + 1, mag.min() - 1) # Invert y axis
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10, 5))
# Observed cluster
cmd_plot(my_cluster.color, my_cluster.mag, "Observed stars", ax1)
# Synthetic cluster
# Boolean mask identifying the binary systems
binary_msk = ~np.isnan(synth_arr[-1])
# Extract magnitude and color
mag, color = synth_arr[0], synth_arr[1]
# Plot single systems
cmd_plot(color[~binary_msk], mag[~binary_msk], "Single systems", ax2)
# Plot binary systems
cmd_plot(color[binary_msk], mag[binary_msk], "Binary systems", ax2)
# Get isochrone associated to the synthetic cluster
isoch_arr = synthcl.get_isochrone(fit_params)
# Plot the isochrone
plt.plot(isoch_arr[1], isoch_arr[0], c="k");
