qp
practical example¶Alex Malz, Phil Marshall, Eric Charles
In this notebook we use the qp
module to study some photo-Z PDFs.
First let's import the packages we will need for this notebook
import numpy as np
import os
import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt
%matplotlib inline
import qp
Now lets download the data files we will need, if we haven't already
base_url = 'https://slac.stanford.edu/~echarles/qp_example'
if not os.path.exists('qp_test_ensemble.hf5'):
os.system('curl -o %s -OL %s/%s' % ('qp_test_ensemble.hf5', base_url, 'qp_test_ensemble.hf5'))
% Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 45.9M 100 45.9M 0 0 84.8M 0 --:--:-- --:--:-- --:--:-- 84.7M
Now we read the ensemble, note that we only need the name of the data file, the name of the metadata file is assumed.
ens = qp.read('qp_test_ensemble.hf5')
This will show use that the
# Confirm that we have read the ensembles
print("Ensemble = ", ens)
# Print some simple information about the ensemble
print("Rep = ", ens.gen_class.name)
print("NPDF = ", ens.npdf)
print("Metadata = ", ens.metadata())
Ensemble = <qp.ensemble.Ensemble object at 0x7f37ec131550> Rep = interp NPDF = 20000 Metadata = {'pdf_name': ['interp'], 'pdf_version': [0], 'xvals': array([[0.005, 0.015, 0.025, 0.035, 0.045, 0.055, 0.065, 0.075, 0.085, 0.095, 0.105, 0.115, 0.125, 0.135, 0.145, 0.155, 0.165, 0.175, 0.185, 0.195, 0.205, 0.215, 0.225, 0.235, 0.245, 0.255, 0.265, 0.275, 0.285, 0.295, 0.305, 0.315, 0.325, 0.335, 0.345, 0.355, 0.365, 0.375, 0.385, 0.395, 0.405, 0.415, 0.425, 0.435, 0.445, 0.455, 0.465, 0.475, 0.485, 0.495, 0.505, 0.515, 0.525, 0.535, 0.545, 0.555, 0.565, 0.575, 0.585, 0.595, 0.605, 0.615, 0.625, 0.635, 0.645, 0.655, 0.665, 0.675, 0.685, 0.695, 0.705, 0.715, 0.725, 0.735, 0.745, 0.755, 0.765, 0.775, 0.785, 0.795, 0.805, 0.815, 0.825, 0.835, 0.845, 0.855, 0.865, 0.875, 0.885, 0.895, 0.905, 0.915, 0.925, 0.935, 0.945, 0.955, 0.965, 0.975, 0.985, 0.995, 1.005, 1.015, 1.025, 1.035, 1.045, 1.055, 1.065, 1.075, 1.085, 1.095, 1.105, 1.115, 1.125, 1.135, 1.145, 1.155, 1.165, 1.175, 1.185, 1.195, 1.205, 1.215, 1.225, 1.235, 1.245, 1.255, 1.265, 1.275, 1.285, 1.295, 1.305, 1.315, 1.325, 1.335, 1.345, 1.355, 1.365, 1.375, 1.385, 1.395, 1.405, 1.415, 1.425, 1.435, 1.445, 1.455, 1.465, 1.475, 1.485, 1.495, 1.505, 1.515, 1.525, 1.535, 1.545, 1.555, 1.565, 1.575, 1.585, 1.595, 1.605, 1.615, 1.625, 1.635, 1.645, 1.655, 1.665, 1.675, 1.685, 1.695, 1.705, 1.715, 1.725, 1.735, 1.745, 1.755, 1.765, 1.775, 1.785, 1.795, 1.805, 1.815, 1.825, 1.835, 1.845, 1.855, 1.865, 1.875, 1.885, 1.895, 1.905, 1.915, 1.925, 1.935, 1.945, 1.955, 1.965, 1.975, 1.985, 1.995, 2.005, 2.015, 2.025, 2.035, 2.045, 2.055, 2.065, 2.075, 2.085, 2.095, 2.105, 2.115, 2.125, 2.135, 2.145, 2.155, 2.165, 2.175, 2.185, 2.195, 2.205, 2.215, 2.225, 2.235, 2.245, 2.255, 2.265, 2.275, 2.285, 2.295, 2.305, 2.315, 2.325, 2.335, 2.345, 2.355, 2.365, 2.375, 2.385, 2.395, 2.405, 2.415, 2.425, 2.435, 2.445, 2.455, 2.465, 2.475, 2.485, 2.495, 2.505, 2.515, 2.525, 2.535, 2.545, 2.555, 2.565, 2.575, 2.585, 2.595, 2.605, 2.615, 2.625, 2.635, 2.645, 2.655, 2.665, 2.675, 2.685, 2.695, 2.705, 2.715, 2.725, 2.735, 2.745, 2.755, 2.765, 2.775, 2.785, 2.795, 2.805, 2.815, 2.825, 2.835, 2.845, 2.855, 2.865, 2.875, 2.885, 2.895, 2.905, 2.915, 2.925, 2.935, 2.945, 2.955, 2.965, 2.975, 2.985, 2.995, 3.005]], dtype=float32)}
Now we are going to plot some PDFs from the ensemble
Note that the first call to the plot
specifies the x-axis limits, but does not specify the key (i.e., which PDF in the ensemble), so that defaults to 0 (i.e., the first PDF).
axes = ens.plot(xlim=(0., 0.5), label="PDF 0")
_ = ens.plot(key=1, axes=axes, label="PDF 1")
_ = ens.plot(key=1300, axes=axes, label="PDF 1300")
legend = axes.figure.legend()
Now we are going to extract some information from the ensemble and time how long it takes to do so
# These are the grid points and quantiles at which we will extract values
test_xvals = ens.gen_obj.xvals
test_quantiles = np.linspace(0, 1, 51)
%%time
# Time the pdf evaluation for 20000 PDFs
pdfs = ens.pdf(test_xvals)
CPU times: user 238 ms, sys: 206 ms, total: 444 ms Wall time: 444 ms
%%time
# Time the cdf (Cumulative distribution function) evaluation for 20000 PDFs
cdfs = ens.cdf(test_xvals)
CPU times: user 233 ms, sys: 198 ms, total: 431 ms Wall time: 430 ms
%%time
ppfs = ens.ppf(test_quantiles)
CPU times: user 1.62 s, sys: 23.3 ms, total: 1.64 s Wall time: 1.64 s
%%time
# Time the sf (survival fraction, 1-cdf) evaluation for 20000 PDFs
sfs = ens.sf(test_xvals)
CPU times: user 231 ms, sys: 229 ms, total: 460 ms Wall time: 459 ms
%%time
# Time the isf (inverse survival fraction) evaluation for 20000 PDFs
isfs = ens.isf(test_quantiles)
CPU times: user 1.62 s, sys: 18.8 ms, total: 1.63 s Wall time: 1.63 s
%%time
# Time the generation of 100 samples for each of the 20000 PDFs
samples = ens.rvs(size=100)
CPU times: user 1.72 s, sys: 29.6 ms, total: 1.75 s Wall time: 1.75 s
%%time
# Convert to a grid using 51 grid points
ens_g51 = qp.convert(ens, 'interp', xvals=np.linspace(0, 3, 51))
CPU times: user 110 ms, sys: 76.1 ms, total: 186 ms Wall time: 186 ms
%%time
# Convert to a grid using 21 grid points
ens_g21 = qp.convert(ens, 'interp', xvals=np.linspace(0, 1, 21))
CPU times: user 52.2 ms, sys: 58.1 ms, total: 110 ms Wall time: 110 ms
/afs/slac.stanford.edu/u/ek/echarles/vol2/vro/software/qp/qp/interp_pdf.py:82: RuntimeWarning: invalid value encountered in true_divide self._yvals = (self._yvals.T / self._ycumul[:,-1]).T /afs/slac.stanford.edu/u/ek/echarles/vol2/vro/software/qp/qp/interp_pdf.py:83: RuntimeWarning: invalid value encountered in true_divide self._ycumul = (self._ycumul.T / self._ycumul[:,-1]).T
key = 0
axes_g = ens.plot(key, xlim=(0, 0.5), label="orig")
_ = ens_g51.plot(key, axes=axes_g, label="g51")
_ = ens_g21.plot(key, axes=axes_g, label="g21")
leg_g = axes_g.figure.legend()
%%time
# Convert using 51 quantiles
ens_q51 = qp.convert(ens, 'quant', quants=np.linspace(0.01, 0.99, 51))
CPU times: user 1.59 s, sys: 0 ns, total: 1.59 s Wall time: 1.59 s
%%time
# Convert using 21 quantiles
ens_q21 = qp.convert(ens, 'quant', quants=np.linspace(0.01, 0.99, 21))
CPU times: user 1.54 s, sys: 472 µs, total: 1.54 s Wall time: 1.54 s
key = 0
axes_q = ens.plot(key, xlim=(0, 0.5), label="orig")
_ = ens_q51.plot(key, axes=axes_q, label="q51")
_ = ens_q21.plot(key, axes=axes_q, label="q21")
leg_q = axes_q.figure.legend()
%%time
# Convert to a histogram using 51 bins
ens_h51 = qp.convert(ens, 'hist', bins=np.linspace(0, 3.0, 51))
CPU times: user 98.4 ms, sys: 66.8 ms, total: 165 ms Wall time: 165 ms
%%time
# Convert to a histogram using 21 bins
ens_h21 = qp.convert(ens, 'hist', bins=np.linspace(0, 3.0, 21))
CPU times: user 63.6 ms, sys: 48.2 ms, total: 112 ms Wall time: 112 ms
key = 0
axes_h = ens.plot(key, xlim=(0, 0.5), label="orig")
_ = ens_h51.plot(key, axes=axes_h, label="h51")
_ = ens_h21.plot(key, axes=axes_h, label="h21")
leg_h = axes_h.figure.legend()
qp
also includes spline-based and Gaussian mixture represenations. The conversion to these forms much slower, so we will first reduce the base ensemble from 20000 PDFs to 100 PDFs
ens_red = ens[np.arange(100)]
print("Reduced ensemble has %i PDFs" % (ens_red.npdf))
Reduced ensemble has 100 PDFs
We can convert to the spline representation a few different ways. This particular method specifies that we should evaluate each PDF at a grid of points and then use those to construct the spline represenation. We do this for 2 different grids. Note how much slower this conversion function is that the ones above.
%%time
# Convert to a histogram using 51 grid points
ens_s51 = qp.convert(ens_red, 'spline', xvals=np.linspace(0, 3.0, 51), method="xy")
CPU times: user 2.31 s, sys: 0 ns, total: 2.31 s Wall time: 2.31 s
ens_s21 = qp.convert(ens_red, 'spline', xvals=np.linspace(0, 3.0, 21), method="xy")
/afs/slac.stanford.edu/u/ek/echarles/vol2/vro/software/qp/qp/spline_pdf.py:49: RuntimeWarning: invalid value encountered in true_divide return (yvals.T / integrals).T
key = 0
axes_s = ens_red.plot(key, xlim=(0, 0.5), label="orig")
_ = ens_s51.plot(key, axes=axes_s, label="s51")
_ = ens_s21.plot(key, axes=axes_s, label="s21")
leg_s = axes_s.figure.legend()
%%time
# Convert to a gaussian mixture using 301 sample points and 3 components
ens_m3 = qp.convert(ens_red, 'mixmod', xvals=np.linspace(0, 3.0, 301), ncomps=3)
CPU times: user 1.77 s, sys: 54.2 ms, total: 1.83 s Wall time: 869 ms
%%time
# Convert to a gaussian mixture using 301 sample points and 3 components
ens_m5 = qp.convert(ens_red, 'mixmod', xvals=np.linspace(0, 3.0, 301), ncomps=5)
CPU times: user 2.12 s, sys: 33.2 ms, total: 2.15 s Wall time: 1.11 s
key = 0
axes_m = ens_red.plot(key, xlim=(0, 0.5), label="orig")
_ = ens_m3.plot(key, axes=axes_m, label="m3")
_ = ens_m5.plot(key, axes=axes_m, label="m5")
leg_m = axes_m.figure.legend()
key = 1
axes_m1 = ens_red.plot(key, xlim=(0, 0.5), label="orig")
_ = ens_m3.plot(key, axes=axes_m1, label="m3")
_ = ens_m5.plot(key, axes=axes_m1, label="m5")
leg_m1 = axes_m1.figure.legend()